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Motivated by studying asymptotic properties of the maximum 
likelihood estimator (MLE) in stochastic volatility (SV) models, in 
this paper we investigate likelihood estimation in state space models. 
We first prove, under some regularity conditions, there is a consistent 
sequence of roots of the likelihood equation that is asymptotically 
normal with the inverse of the Fisher information as its variance. 
With an extra assumption that the likelihood equation has a unique 
root for each n, then there is a consistent sequence of estimators of the 
unknown parameters. If, in addition, the supremum of the log likeli- 
hood function is integrable, the MLE exists and is strongly consistent. 
Edgeworth expansion of the approximate solution of likelihood equa- 
tion is also established. Several examples, including Markov switching 
models, ARMA models, (G)ARCH models and stochastic volatility 
(SV) models, are given for illustration. 

1. Introduction. Motivated by studying asymptotic properties of the 
maximum likelihood estimator (MLE) in stochastic volatility (SV) models, 
in this paper we investigate likelihood estimation in state space models. A 
state space model is, loosely speaking, a sequence {£, n }%Lo of random vari- 
ables obtained in the following way. First, a realization of a Markov chain 
X = {X n ,n > 0} is created. This chain is sometimes called the regime and is 
not observed. Then, conditional on X, the ^-variables are generated. Usually 
the dependence of £ n on X is more or less local, as when £ n = g(X n , £ n _i , rj n ) 
for some function g and random sequence {r/n}, independent of X. £ n itself 
is generally not Markov and may, in fact, have a complicated dependence 
structure. When the state space of {X n ,n > 0} is finite, it is the so-called 
hidden Markov model or Markov switching model. 
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The statistical modeling and computation for state space models have 
attracted a great deal of attention recently because of their importance in 
applications to speech recognition [49], signal processing [17], ion channels 
[1], molecular biology [40] and economics [8, 19, 51]. The reader is referred 
to [20, 34, 41] for a comprehensive summary. The main focus of these efforts 
has been state space modeling and estimation, algorithms for fitting these 
models and the implementation of likelihood based methods. 

The state space model here is defined in a general sense, in which the ob- 
servations are conditionally Markovian dependent, and the state space of the 
driving Markov chain need not be finite or compact. When the state space 
is finite and the observation is a deterministic function of the state space, 
Baum and Petrie [3] established the consistency and asymptotic normality 
of the MLE. When the observed random variables are conditionally inde- 
pendent, Leroux [44] proved strong consistency of the MLE, while Bickel, 
Ritov and Ryden [7] established asymptotic normality of the MLE under 
mild conditions. Jensen and Petersen [39], Douc and Matias [14] and Douc, 
Moulines and Ryden [15] studied asymptotic properties of the MLE for 
general "pseudo-compact" state space models. By extending the inference 
problem to time series analysis where the state space is finite and the ob- 
served random variables are conditionally Markovian dependent, Goldfeld 
and Quandt [30] and Hamilton [33] considered the implementation of the 
maximum likelihood estimator in switching autoregressions with Markov 
regimes. Francq and Roussignol [21] studied the consistency of the MLE, 
while Fuh [23] established the Bahadur efficiency of the MLE in Markov 
switching models. We now give two examples of state space models. 

Example 1 [GARCH(p,q) model]. For given p> 1 and q > 0, let 

p i 

(1.1) Y n = a n e n and a 2 n = 5 + s ^a i a 2 n _ i + s ^f3 j Y 2 _ p 

i=l j=l 

where 5 > 0, a, > and f3j > are constants, e n is a sequence of independent 
and identically distributed (i.i.d.) random variables, and e n is independent of 
{y n _/c, k > 1} for all n. This is the celebrated GARCH(p, q) model proposed 
by Bollerslev [8] . When q = or /3j = 0, for j = 1, . . . , q, this is the ARCH(p) 
model first considered by Engle [19]. The reader is referred to [9] and [20] 
for a comprehensive summary. 

For convenience of notation, we assume that p, q > 2, and by adding 
some ai or f3j equal to zero if necessary. Denote rj n = o~ x Y n , r n = {a\ + 
ftf£,a2,...,ap-i)€RP-\ £ n = (rg, 0, . . . , 0) e W~\ /3 = (/3 2 , . . . , e 
¥L q ~ 2 , and let I p -\ and I q -2 be identity matrices. Let A n be a (p + q — 1) x 
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(p + q — 1) matrix written in block form as 
(1.2) A n 





Otp 


/3 















Cn 











. 





I q -2 


. 



Note that {A n ,n > 0} are i.i.d. random matrices. 

Let Z = (5,0, ... ,0)' G BP+i- 1 and X n = (a n+1 , a 2 n _ p+2 , ¥%,..., 
_l_ 2 )', where "'" denotes transpose. Following the idea of Bougerol and 
Picard [10], we have the following state space representation of the 
GARCH(p,g) model: X n is a Markov chain governed by 



(1-3) Xn+l — A n+ iX n + Z, 

-q+2 



and £ n := g(X n ) = (Y 2 , . . . , Y^_ , 2 )', the observed random quantity, is a 



noninvertible function of X n . 

Example 2 (Stochastic volatility models). Let 
(1-4) Y n = a n e n , 

where log a n follows an AR(1) process and e n is a sequence of i.i.d. ran- 
dom variables with standard normal probability density function. This is 
the discrete time stochastic volatility model proposed by Taylor [51]. The 
reader is referred to [29, 50, 52] for a comprehensive summary. Note that 
Genon-Catalot, Jeantheau and Laredo [27] studied the ergodicity and mix- 
ing properties of stochastic volatility models from the hidden Markov model 
point of view. 

Write X n := logu^ and Y n = ae n ex.p(X n /2), where a is a scale parameter. 
Squaring the observations in the above equation and taking logarithms gives 
log = logo -2 + X n + loge n . Alternatively, we have 

(1.5) i og y n 2 = w + x n + c„, 

where u = logo" 2 + E\oge n , so that the disturbance C,n has mean zero by 
construction. The scale parameter a also removes the need for a constant 
term in the stationary first-order autoregressive process 

(1.6) l n = al n _i+jj„, |a| < 1, 

where r\ n is a sequence of i.i.d. random variables distributed as A r (0,a 2 ). 
Moreover, we assume that Cn. and rj n are independent. Note that in (1.5) 
and (1.6) the observed random quantity is £ n := \ogY 2 . {X n ,n > 0} and 
forms a Markov chain with transition probability 

(1.7) p(x k -i,x k ) = (2ira v ) 1 ex P|-2 ^2 J 
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and stationary distribution tt ~ N(0, — a)). 

For given observations y = (logyf, . . . , logy^) from the state space model 
(1.5) and (1.6), the likelihood function of the parameter 6 = (a,a^) is 

(1-8) 

x f((logy k -u -x k )dx n ---dx , 

where /f(-) is the probability density function of £i- 

A major difficulty in analyzing the likelihood function in state space mod- 
els is that it can be expressed only in integral form; see equation (1.8), for 
instance. In this paper we provide a device which represents the integral like- 
lihood function as the Li-norm of a Markovian iterated random functions 
system. This new representation enables us to apply results of the strong 
law of large numbers, central limit theorem and Edgeworth expansion for 
the distributions of Markov random walks, and to verify strong consistency 
of the MLE and first-order efficiency and Edgeworth expansion on the solu- 
tion of the likelihood equation. Note that third-order efficiency follows from 
Edgeworth expansion by a standard argument (cf. [28]). Another essential 
point worth being mentioned is that we introduce a weight function in a 
suitable way [see (4.1)-(4.3), Assumptions K2, K3 and Definition 2 in Sec- 
tion 4, and CI in Section 5] to relax the condition of a compact state space 
for the underlying Markov chain, and to cover several interesting examples. 

The remainder of this paper is organized as follows. In Section 2 we define 
the state space model as a general state Markov chain in a Markovian ran- 
dom environment, and represent the likelihood function as the Li-norm of 
a Markovian iterated random functions system. In Section 3 we give a brief 
summary of a Markovian iterated random functions system, and provide 
an ergodic theorem and the strong law of large numbers. The multivariate 
central limit theorem and Edgeworth expansion for a Markovian iterated 
random functions system are given in Section 4. Section 5 contains our main 
results, where we consider efficient likelihood estimation in state space mod- 
els, and state the main results. First, we compute Fisher information and 
prove the existence of an efficient estimator in a "Cramer fashion." Second, 
we characterize Kullback-Leibler information, and prove strong consistency 
of the MLE. Last, we establish Edgeworth expansion of the approximate 
solution of the likelihood equation. In Section 6 we consider a few examples, 
including Markov switching models, ARMA models, (G)ARCH models and 
SV models, which are commonly used in financial economics. The proofs of 
the lemmas in Section 5 are given in Section 7. Other technical proofs are 
deferred to the Appendix. 
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2. State space models. A state space model is defined as a parame- 
terized Markov chain in a Markovian random environment with the un- 
derlying environmental Markov chain viewed as missing data. Specifically, 
let X = {X n ,n > 0} be a Markov chain on a general state space X, with 
transition probability kernel P (x,-) = P e {X\ G -\X$ = x} and stationary 
probability ttq(-), where fl£9C R 9 denotes the unknown parameter. Sup- 
pose that a random sequence {Cn}^=o, taking values in R rf , is adjoined to 
the chain such that {(X n ,£ n ),n > 0} is a Markov chain on X x R d satis- 
fying P e {X 1 G A\X = x,£ = s} = P^Xi G A\X = x} for A G B(X), the 
cr-algebra of X . And conditioning on the full X sequence, £ n is a Markov 
chain with probability 

P {Cn+i g ^|^o,^i, ■ • ■ ;£o,£i, •■ ■ 

(2.1) 

= P e {£„ +1 G5|X n+1 ;£ n } a.s. 

for each n and B € £>(R d ), the Borel cr-algebra on H d . Note that in (2.1) the 
conditional probability of £ n+ i depends on X n+ \ and £ n only. Furthermore, 
we assume the existence of a transition probability density pg(x,y) for the 
Markov chain {X n ,n > 0} with respect to a cr-finite measure m on X such 
that 

P e {X 1 eA,^eB\X = x,^ = s } 

(2.2) 

= / / Pe(x,y)f(s;6\y,s )Q(ds)m(dy), 

where 0\Xk,£,k-i) is the conditional probability density of given 
and Xfc, with respect to a cr-finite measure Q on We also assume that 
the Markov chain {(X n ,^ n ),n > 0} has a stationary probability with proba- 
bility density function ir(x)f(-;9\x) with respect to m x Q. In this paper we 
consider 9 = (9\, . . . , 9 q ) G C R 9 as the unknown parameter, and the true 
parameter value is denoted by 9q. We will use ir(x) for ttq(x), p(x,y) for 
Pe(x,y), f($o\X ) for f(€ ;9\X ), and for /(£*;0|X fc ,&-i), 

here and in the sequel, depending on our convenience. Now we give a formal 
definition as follows. 



Definition 1. {£,rn n > 0} is called a state space model if there is a 
Markov chain {X n ,n > 0} such that the process {(X n ,^ n ),n > 0} satis- 
fies (2.1). 

Note that this setting includes several interesting examples of Markov- 
switching Gaussian autoregression of Hamilton [33], (G)ARCH models of 
Engle [19] and Bollerslev [8], and SV models of Clark [12] and Taylor [51]. 
When the state space X is finite or compact, this reduces to the hidden 
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Markov model considered by Prancq and Roussignol [21], Fuh [22, 23, 25] 
and Douc, Moulines and Ryden [15]. Denote S n = Ylt=i£t- When £ n are 
conditionally independent given X, the Markov chain {(X n , S n ), n > 0} is 
called a Markov additive process and S n is called a Markov random walk. 
Furthermore, if the state space X is finite, {£ n , n > 0} is the hidden Markov 
model studied by Leroux [44], Bickel and Ritov [6] and Bickel, Ritov and 
Ryden [7]. When the state space X is "pseudo-compact" and £ n are condi- 
tionally independent given X, {£ n , n > 0} is the state space model considered 
in [39] and [14]. 

For given observations so, s\, . . . , s n from a state space model {£ n , n > 0}, 
the likelihood function is 

Pn(so,si, ...,s n ;9) 

^e(x )f(s ;9\x ) 



(2.3) X ° eX Xn€X 

x Ylpe(xj-i,xj) 

x f (sj \ 9\xj , Sj-i)m(dx n ) • • ■ m(dxo). 

Recall that TTe(xo)f(so; 0\xq) is the stationary probability density with re- 
spect to m x Q of the Markov chain {(X n ,£ n ), n > 0}. 

To represent the likelihood p n (Co, £i, • • ■ , £,n] 9) as the Li-norm of a Marko- 
vian iterated random functions system, let 

(2.4) M = < h\h : X — > R + is m- measurable and / h(x)m(dx) < oo 

I Jx&X 

For each j = l,...,n, define the random functions P#(£o) and P#(£j) on 
(X x R d ) x M as 

(2.5) Pe(£,o)h(x) = / f(^o;9\x)h(x)m(dx), a constant, 

JxGX 



(2.6) P e (Cj)h(x)= Pe (x,y)f^ r} 9\y^ j . 1 )h(y)m(dy). 

JyeX 

Define the composition of two random functions as 
P fl (^ +1 )oP fl (^)/»(x) 

(2.7) = / p*(s, *)/(&; 



a 



Pe(z, y)f(Zj+i;0\y, ij)h{y)m{dy) ) m(dz) 

yex 
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For h G M, denote \\h\\ := f xeX h{x)m(dx) as the Z^-norm on M with respect 
to m. Then the likelihood p n (£,o,t,i, ■■■ , Cn', 9) can be represented as 

= I ■•• / no{xo)f{£Q\0\xo) 

n 

(2.8) x Y[p e (x j - 1 ,x j ) 

= ||Pe(e™)°---°Pe(ei)oPe(^o)vr e ||. 

Note that, for j = 1, . . . , n, the integrand pe(x,y)f(€j;0\y,£j-i) of P <?(£;) 
in (2.6) and (2.8) represents -Xj-i = x and Xj G dy, and £j is a Markov chain 
with transition probability density /(£,•; for given X. By definition 

(2.1), {(X n ,£ n ),n > 0} is a Markov chain, and this implies that Pe(£j) is a 
sequence of Markovian iterated random functions systems (see Section 5 for 
a formal definition). Therefore, by representation (2.8), p n (£o>£i> • • • >£n;0) 
is the Li-norm of a Markovian iterated random functions system. 

3. Ergodic theorems for a Markovian iterated random functions system. 

To analyze the asymptotic properties of efficient likelihood estimators in 
state space models, in this section we study the ergodic theorem and the 
strong law of large numbers for a Markovian iterated random functions sys- 
tem. The Markovian iterated random functions system is a generalization 
of an iterated random functions system, in which the random functions are 
driven by a Markov chain. For a general account of an iterated random 
functions system, the reader is referred to [13] for a recent survey. 

For simplicity in our notation, let {Y n ,n > 0} [instead of {(X n ,£ n ),n > 0} 
in Section 2] be a Markov chain on a general state space y with cr-algebra 
A, which irreducible with respect to a maximal irreducibility measure on 
(y,A) and is aperiodic. The transition kernel is denoted by P{y,A). Let 
(M, d) be a complete separable metric space with Borel a-algebra B(M). 
Denote by Mq a random variable which is independent of {Y n ,n > 0}. A 
sequence of the form 

(3.1) M n = F(Y n ,M n _ 1 ), n>l, 

taking values in (M, e£) is called a Markovian iterated random functions 
system (MIRFS) of Lipschitz functions providing the following: 

(1) {Y n ,n > 0} is a Markov chain taking values in a second countable 
measurable space (y,A), with transition probability kernel P(-,-) and sta- 
tionary probability ir, and Mq is a random element on a probability space 
(Q, .F, P), which is independent of {Y n ,n > 0}; 



8 



C.-D. FUH 



(2) F : (y x M, A ® B(M)) -> (M, B(M)) is jointly measurable and Lip- 
schitz continuous in the second argument. 

Clearly, {(Y n , M n ),n > 0} constitutes a Markov chain with state space y x 
M and transition probability kernel P, given by 

(3.2) P((y,u),AxB):= [ I B (F(z,u))P(y,dz) 

Jz£A 

for all y £y,u £ M,A £ A and B € £>(M), where / denotes the indicator 
function. The ?i-step transition kernel is denoted P n . For (y,u) E y x M, 
let P yu be the probability measure on the underlying measurable space 
under which Yq = y, Mq = u a.s. The associated expectation is denoted 
E yu , as usual. For an arbitrary distribution v on y x M, we put P^(-) := 
j P yu {-)v(dy x du) with associated expectation Ej,. We use P and E for 
probabilities and expectations, respectively, that do not depend on the ini- 
tial distribution. 

Let Mo be a dense subset of M and Ai (Mo, M) the space of all mappings 
h : Mo — > M endowed with the product topology and product <r-algebra. 
Then the space £lip(M, M) of all Lipschitz continuous mappings h : M — > M 
properly embedded forms a Borel subset of A^(Mo,M), and the mappings 

£ L i P (M,M) x M 3 (h, u) i-> h{u) e M, 

CuAMM)Bh^l(h):= S u / {h ^ h [ v)) 

u ^ v d(u,v) 

are Borel; see Lemma 5.1 in [13] for details. Hence, 

(3.3) L n :=l(F(Y n ,-)), n > 0, 

are also measurable and form a sequence of Markovian dependent random 
variables. 

An important point to characterize the limit in the ergodic theorem will 
be the right use of the idea of duality. For this purpose, we introduce a 
time-reversed (or dual) Markov chain {Y n ,n > 0} of {Y n ,n > 0} as follows. 
Assume that there exists a cr-finite measure m on (y,A) such that the 
probability measure P on (y,A) defined by P(A) = P(Y\ 6 A\Yq = y) is 
absolutely continuous with respect to m, so that P(A) = j A p(y, z)m(dz) for 
all A £ A, where p(y, ■) = dP/dm. The Markov chain {Y n ,n > 0} is assumed 
to have an invariant probability measure n which has a positive probability 
density function ir (without any confusion, we still use the same notation) 
with respect to m. We shall use ~ to refer to the time-reversed (or dual) 
process {Y n ,n > 0} with transition probability density 



(3.4) 



P(z,y) =p{y,z)i:{y)/iT(z). 
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Denote P as the corresponding probability. It is easy to see that both Y n and 
Y n have the same stationary distribution ir. In this section we will assume 
that the initial distribution of Yq is the stationary distribution ir. 

In the following, we write F n (u) for F(Y n ,u). For all 1 < k < n, let 
-^fc:n := -Ffc o • • • o F n , F n:fc := F n o ■ ■ ■ o Ff., where o denotes the composition 
of functions. Denote F mn -\ as the identity on M, Hence 

(3.5) M n = F n (M n _i) = F n:1 (M ) 

for all n > 0. Closely related to these forward iterations, and in fact a key 
tool to the analysis of the ergodic property, is the sequence of backward 
iterations 

(3.6) M n :=F 1:n {M ), n > 0. 
The connection is established by the identity 

(3.7) 7r(y)P(M n G -|Y = y) = Tr(z)P(M n e -|y = z) 

for all n > 0. Put also M% := F n: \(u) and M% := Fi :n (u) for it G M and note 
that 

f [ P((M%,M%) n > e-\Y = y,Y = z)Tr(dy)n(dz) 

(3.8) 

= / P((M n ,M n ) n > e -\Yo = y,Y = z)ir(dy)ir(dz). 
JzeyJyzy 

Note that in (3.8), the probability P denotes a joint probability. 

{Y>n, n > 0} is called Harris recurrent if there exist a set AG A, & prob- 
ability measure T concentrated on A and an e with < e < 1 such that 
Py(Y n E j4 i.o.) = 1 for all y G 3^ and, furthermore, there exists n such that 
^"(2/,^') > er(A') for all y G A and all A' G A. 

A central question for an MIRFS (M n ) n >o is under which conditions it 
stabilizes, that is, converges to a stationary distribution II. The next theorem 
summarizes the results regarding this question. 

Theorem 1. Let {Y n ,n > 0} be an aperiodic, irreducible and Harris re- 
current Markov chain, and let (M n ) n >o be an MIRFS of Lipschitz functions. 
Suppose the initial distribution of Yq is ir, and 

(3.9) ElogZ(Fi)<0 and Elog + d(iq(u ), u ) < oo 

for some uq G M. Then the following assertions hold: 

(i) M n converges a.s. to a random element which does not depend 
on the initial distribution. 

(ii) M n converges in distribution to under P. 
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(iii) Define H as the stationary distribution of (Y 00 ,M 00 ). Then II is the 
unique stationary probability of the Markov chain {(Y n , M n ),n > 0}. 

(iv) (M n ) n >o is ergodic under Pn, that is, for any u G M, 

1 n 

(3.10) - V 5 (M fc )— ►EntetMoo)), Pn-a.s. 

n ^— ' 

fc=l 

/or bounded continuous real-valued functions g on M. 

We remark that Elton [18] showed in the situation of a stationary sequence 
{F n ) n >\ that Theorem 1 holds whenever Elog + l(F\) and Elog + d(Fi(uo), Uq) 
are both finite for some (and then all) uq G M and the Lyapunov exponent 
7 := liuin^ao n" 1 log/(F n: i) , which exists by Kingman's subadditive ergodic 
theorem, is a.s. negative. Since the initial distribution of Yq is the stationary 
distribution tt, the Markov chain Y n is a stationary sequence, and hence, 
M n is a sequence of iterated random functions generated by stationary se- 
quences. Here, we impose the Harris recurrent condition so that the invariant 
measure tt exists, and we are able to characterize in a Markovian setting. 
Since the proof is similar to that in [2], it is omitted. 

4. Central limit theorem and Edgeworth expansion for distributions of 
a Markovian iterated random functions system. Consider the Markovian 
iterated random functions system {(Y n ,M n ),n > 0} defined in (3.1). Abuse 
the notation a little bit and let g be an Revalued function on M. In this 
section we study the central limit theorem and Edgeworth expansion of 
the sum S n = Ylk=i d(Mk) and g(n _1 5 n ) for a smooth function g:R p — >• 
R 9 . Let w.y — > [l,oo) be a measurable function, and let B be the Banach 
space of measurable functions h-.y — > C (:= the set of complex numbers) 
with \\h\\ w := sup y \h(y)\/w(y) < oo. Assume further that {Y n ,n > 0} has a 
stationary distribution tt with J w(y)ir(dy) < oo, and 



E[h(Y n )\Y = y}- / h(z)w(dz) 



/w(y):yey,\h\<w\=0, 



(4.1) lim sup 

n— ¥oo y 

(4.2) sup{E[w(Y p )\Y = y]/w(y)} < oo, 

y 

for some p > 1. Condition (4.1) says that the chain is w-uniformly ergodic, 
which implies that there exist 7 > and < p < 1 such that, for all h G B 
and n > 1 , 



(4.3) sup E[h(Y n )\Y = y]-J h(z)n(dz) j w{y) < 1P n 

(cf. pages 382-383 and Theorem 16.0.1 of [46]). We remark that, for w = 1, 
condition (4.1) is the classical uniform ergodicity condition for {Y n ,n > 0}. 
The following assumption will be assumed throughout this section. 
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Assumption K. 

Kl. Let {Y n ,n > 0} be an aperiodic, irreducible Markov chain satisfying 
conditions (4.1)-(4.2). Furthermore, we assume the initial distribution of Yq 

is 7T. 

K2. The MIRFS (M n ) n >o has the weighted mean contraction property, 
that is, there exists a p > 1 such that 



Remark 1. (a) Assumption Kl is a condition for the underlying Markov 
chain {Y n ,n> 0} which is general enough to include several practical used 
models studied in Section 6. Assumption K2 is a weighted mean contraction 
condition which is different from the standard mean contraction condition 
ElogLi < used in Theorem 1. Assumption K3 is a weighted moment con- 
dition. Note that under Assumptions K1-K3, and the extra assumption that 
{(Y n ,M n ),n > 0} is an irreducible, aperiodic and Harris recurrent Markov 
chain, Theorems 13.0.1 and 17.0. l(i) of [46] imply that Theorem 1 still holds. 
Furthermore, we will prove the central limit theorem and Edgeworth expan- 
sion for the distributions of a Markovian iterated random functions system 
in Theorem 2. 

(b) To have better understanding of Assumption K, we consider a sim- 
ple state space model. Given p > 1 as in Assumption K2, and \a\ < 1, let 
Y n = aY n -\ +e n , £ n = /3y n £ n ,_i + rj n , where e n are i.i.d. random variables with 
E\e\ \ = c < oo, and r\ n are i.i.d. random variables with E\rji \ < oo. Further, 
we assume both e\ and rji have positive probability density function with re- 
spect to Lebesgue measure, and that they are mutually independent. Denote 
b = (1 - |a| p )/(l - \a\) and a = l/(bc + 1) < 1, and assume \/3 y \ < a l / p < 1 
for all y € y. It is known that w(y) = \y\ + l (cf. pages 380 and 383 of [46]). 
Let d(u,v) = \u — v\. It is easy to see that Assumption Kl and the first part 
of Assumption K3 hold. To check Assumption K2, we have 




K3. There exists mqGM for which 






(4.4) 
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a (gPy +bc + l ) \ 
= logsup^ p— > = 0. 

By using the same argument, we have the second part of Assumption K3. 
When e n are i.i.d. iV(0, 1), % are i.i.d. iV(0, 1), and they are mutually inde- 
pendent. Then a = V2tt/(26 + v 7 ^ ) < 1. 

Recall that IT is defined in Theorem l(iii) and denote Q(B) := 11(3^ x B) 
for all B G i3(M). Let 5 G £o(Q) be a square integrable function taking 
values in R p with mean 0, that is, g = (gi, ... , g p ) with each a real- valued 
function on M, and 

(4.5) / g k {u)Q{du) = 0, \\gk\\l= l gl(u)Q(du) < 00, 
Jm Jm 

for k = 1, . . . ,p. Consider the sequence 

(4.6) 5 n = 5 n ( 5 )= 5 (M 1 ) + ---+ 9 (M n ), n>l, 

which may be viewed as a Markov random walk on the Markov chain 
{(Y n ,M n ),n>0}. 

Note that there are two special properties of the Markov chain induced 
by the Markovian iterated random functions system (2.4)-(2.7). First, the 
hypothesis that the transition probability possesses a density leads to a 
classical situation in the context of the so-called "Doeblin condition" for 
Markov chains. Second, a positivity hypothesis on M defined in (2.4) in 
the support of the Markov chain leads to contraction properties, on which 
basis we will develop the spectral theory. The reader is referred to [37] for 
a general account of the perturbation theory of Markovian operators. We 
need the following notation first. 

Definition 2. Let w.y ^ [l,oo) be a weight function. For any mea- 
surable function tp : 3^ x M — > [l,oo), given uq G M, define 

\\ip\\ w := sup -— 

y ey,ueM w{y) 

and 

W\\h'= sup TTYm — \^s~ ' 

yey,u,v.0<d(u,v)<l ( w {y) d ( u , v )) 

for < 5 < 1. We define Tl as the set of ip on 3^ x M for which H^H^,/, := 
IMI«j + IMU is finite, where wh represents a combination of the weighted 
variation norm and the bounded weighted Holder norm. 

Let v be an initial distribution of (Yq,Mq) and let E„ denote expectation 
under the initial distribution v on (Yq,Mq). For ip G Ti, g G C 2 (Q), y G 3^, 
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«£M and p x 1 vectors a = (ai, . . . , a p )' € R p , define linear operators T a , 
T, f Q and Q on the space H as 

(4.7) (T a <p)(y,u) = E{e ia '^<p(Y 1 ,M 1 )\Y = y,M = u}, 

(4.8) (T^)(y,«) = EMy 1 ,M 1 )|y = y,Mo = «}, 

(4.9) u a p = E u {e ia '^p(Y ,u)}, Qp = E U W(Y , «)}. 

In the case of a w-uniformly ergodic Markov chain, Fuh and Lai [26] have 
shown that there exists a sufficiently small 5 > such that, for \a\ < 5, 
■H = U 1 {a)eU 2 (a) and 

(4.10) T Q Q a ^ = A(a)Q Q ^ forall^G^, 

where Hi(a) is a one-dimensional subspace of A(a) is the eigenvalue of 
T a with corresponding eigenspace Hi (a) and Q a is the parallel projection 
of H onto the subspace Hi{a) in the direction of Hi (a). Extension of their 
argument to the weight functions w and I defined in Definition 2 is given in 
the Appendix, which also proves the following lemmas. 

Lemma 1. Let {(Y n , M n ),n > 0} be the MIRFS of Lipschitz functions 
defined in (2.1) and satisfying Assumption K. Assume g S C r (Q) for some 
r > 2. Then T and Q are bounded linear operators on the Banach space H 
with norm \\ ■ \\ w h, and satisfy 

(4.11) ||T n -Q|u= sup ||rv-cMU<7^, 

<peH,\\ip\\ wh <l 
for some 7* > and < p* < 1. 

By using an argument similar to Proposition 1 of [24], we have the fol- 
lowing: 

Lemma 2. Let {(Y n ,M n ),n > 0} be the MIRFS defined in (2.1) satis- 
fying Assumption K, such that the induced Markov chain {(Y n ,M n ),n > 0} 
with transition probability kernel (3.2) is irreducible, aperiodic and Harris 
recurrent. Assume g £ C r (Q) for some r > 2. Then there exists 5 > such 
that, for a G R p lozf/i |a| < 5, and for ip GH, 

Eu {e W ^ M -^(Y n ,M n )} = v a T n a <p = i/ a T£{Q Q + (/ - Q a )}p 

(4.12) 

= A n (a)l/ a Q Q <p + "aQai 1 - Qa)<P, 

and: 

(i) A(a) is i/ie unique eigenvalue of the maximal modulus ofT a ; 

(ii) Q a is a rank-one projection; 
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(iii) the mappings \(a),Q a and I — Q a are analytic; 

(iv) | A (a) | > and for each k £ N, the set of positive integers, there 
exists c> such that, for each n £ N and j p with j\ + • • • + j p = k , 



-(I-QaT 



< C 



daf • • • da 3 p p 

(v) denote g = (g±, . . . , g p ), and let jj := lim 
the upper Lyapunov exponent; it follows that 

d\(a 



l + 2p* 



(4.13) 



7j 



day 



o=0 





(l/n)Ey U log||^(M re )||, 
E yu gj(Mi)Il(dy x cfot). 



Note that in Lemma 2 we need the extra assumption that the induced 
Markov chain {(Y n , M n ), n > 0} with transition probability kernel (3.2) is 
irreducible, aperiodic and Harris recurrent. In Section 5 we will show that 
this condition is satisfied for the Markov chain induced by the Markovian 
iterated random functions system (2.4)-(2.7). 

For given S n = Yl=i9i M k) of the MIRFS {(Y n ,M n ),n > 0}, in this 
section we will obtain Edgeworth expansions for the standardized distri- 
bution of S n via the representation (4.12) of the characteristic function 
B(e ia ' 9( - M ^\Y = y,M = 0). Note that Lemma 1 implies that {(Y n ,M n ),n > 
0} is geometrically mixing in the sense that there exist r\ > and < 71 < 1 
such that, for all y £ y , u £ M, k > and n > 1 and for all real- valued mea- 
surable functions (pi,<f2 with ||</9ilU/i < 00 and || ^llUfr < 00 1 

||E{y>! (Y k , M k )<p 2 (Y k+n , M k+n )\Y = y,M = u} 

(4.14) -{E<p 1 (Y k ,M k )\Y = y,M = u} 

x {E<p 2 (Y k+n ,M k+n \Yo = y,Mo = u)}\\ wh <rryj\ 

Let <fi,<f2 be real- valued measurable functions on (y x M) x (y x M). De- 
note (fi(z, v) = E{(^i((z, v), (Yi,Mi))\Yq = z, Mo = v)}, and note that 

E{<p 1 ((Y k ,M k ),(Y k+1 ,M k+1 ))\Y Q = y,M = u} 

= E{<^ (Y k , M k )\Y = y, M = u}. 

The same proof as that of Theorem 16.1.5 of [46] can be used to show that 
there exist r\ > and < 71 < 1 such that, for all y £ y, u £ M, k > and 
n > 1 and for all measurable <px,(f>2 with \\sup z>v if>i((y,u),(z,v))\\ w f l < 00 
and \\swp z>v ipl((y,u),(z,v))\\ wh < 00, 

\\E{^{{Y k ,M k ),{Y k+l ,M k+1 )) 

x <p2((Y k+n , M k+n ), (Y k+n+1 , M k+n+1 ))\Y = y, M = u} 

(4.15) - E{<p{Y k ,M k )\Y = y,M = u}E{<p 2 (Y k+n , M k+n )\Y = y,M = u}\\ wh 



< 



nil 
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To establish Edgeworth expansion for a Markovian iterated random func- 
tions system, we shall make use of (4.15) in conjunction with the following 
extension of Cramer (strongly nonlattice) condition: 

(4.16) inf |1 - E v {exp(ii/ S!(jg)}\ > for all a > 0. 

\v\>a 

In addition, we also assume the conditional Cramer (strongly nonlattice) 
condition ((2.5) on page 216 in [31]): There exists 6 > such that, for all 
m, n = 1, 2, . . . , 5~ l <m <n, and all a £ R p with \a\ > 5, 

E n \E{exp(ia'(g(M n - m ) + • • • + g(M n+m ))) 

(4.17) \(Y n _ m ,M n _ m ), (y n _!, M n _i), 

M n+m+1 )}\<e . 

Let 

(4.18) 7 = J E^< 7 (M 1 )n(rfy x du)(= A'(0)), 

and denote by V = (d 2 \(a)/dcti 9aj| Q =o)i<jj<p the Hessian matrix of A at 
0. By Lemma 2, 

(4.19) lim n^HgiMn) - n 7 )( 5 (M„) - n 7 )'} = V. 

n— >oo 

Let Vn(«) = E„(e ia ' 3 ^ Mn ^). Then by Lemma 2 and the fact that f a Q a /ii 
has continuous partial derivatives of order r — 2 in some neighborhood of 
a = 0, we have the Taylor series expansion of tjj n (a/y/ri) for \a/y/n\ < e 
(some sufficiently small positive number): 

(4.20) M<*/y/n ) | 1 + n ~ j/2j tj (ia) | e""'^/ 2 + o(n"( r - 2 )/ 2 ) , 

where iTj{ia) is a polynomial in ia of degree 3j whose coefficients are smooth 
functions of the partial derivatives of A(a) at a = up to the order j + 2 and 
those of f a Q a /ii at a = up to the order j. Letting D denote the p x 1 vector 
whose jth component is the partial differentiation operator with respect 
to the jih coordinate, define the differential operator tTj- ( — D). As in the case 
of sums of i.i.d. zero-mean random vectors (cf. [5]), we obtain an Edgeworth 
expansion for the "formal density" of the distribution of g(M n ) by replacing 
the TTj{ia) and e ~ a ' Va ^ 2 in (4.20) by TTj(—D) and 4>v(y), respectively, where 
4>V is the density function of the g-variate normal distribution with mean 
and covariance matrix V . Throughout the sequel we let P u denote the 
probability measure under which (Yq,Mq) has initial distribution v. 
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Theorem 2. Let {(Y n ,M n ),n > 0} be the MIRFS defined in (2.1) satis- 
fying Assumption K, such that the induced Markov chain {(Y n , M n ),n > 0}, 
with transition probability kernel (3.2), is irreducible, aperiodic and Harris 
recurrent. Assuming g £ C r {Q) for some r > 2, (4.16) and (4.17) hold. Let 
4>jy — ^j{—D)4>v f or j = 1) • • • j r — 2. For < a < 1 and c > 0, let £> a>c 6e 
i/ie c/ass of all Borel subsets B of R p such that J^q B ^ ( t ) v{y)dy < ce a for 
every e > 0, where dB denotes the boundary of B and (dB) £ denotes its 
e -neighborhood. Then 



sup 

B€B a ,c 



(4-21) =o(n - (r - 2 )/2 ) _ 



A proof of Theorem 2 is given in the Appendix. 

Note that under weaker moment conditions, and an alternative condi- 
tion of (4.16) and (4.17) (see Condition 1 of [42]) Lahiri [42] proved the 
asymptotic expansions for sums of weakly dependent random vectors. 

Letting r = 2 in Theorem 2, we have the following: 

Corollary 1. With the same notation and assumptions as in Theo- 
rem 1, then 

(S n — nj) — > N(0, S) in distribution, 

n 



where the variance-covariance matrix 

d 2 X(a) 



(4.22) 



dai dctj 



o=0/ i,j=l,...,p 



In statistical applications one often works with g(n~ 1 S n ) instead of S n = 
Sfc=i S'(-^fc)) where g : R p — > R 9 is sufficiently smooth in some neighborhood 
of the mean 7 := (71, . . . ,7 P ). Denote g = (gi, . . . ,g g ) with each gj, 1 < i < 
q, a real-valued function on R p . For the case of a sum of i.i.d. random 
variables, Bhattacharya and Ghosh [4] made use of the Edgeworth expansion 
of the distribution of (S n — wy)/y/n to derive an Edgeworth expansion of 
the distribution of ^/n{g( y n~ 1 S n ) — g(7)}. Making use of Theorem 2 and a 
straightforward extension of their argument, we can generalize their result 
to the case where S n is the partial sum of a Markovian iterated random 
functions system. 

Theorem 3. Under the same assumptions as in Theorem 2, suppose 
that g : R p — > R 9 has continuous partial derivatives of order r in some neigh- 
borhood 0/7. Let J g = (-Djgi(7))i<i<i},i<j<p be the qxp Jacobian matrix and 
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let V{g) = J s VJ' Then 



sup 

B£B a , c 



P,{^(g(^ 1 Sn)-g(7))e^} 



(4.23) -J g Uv(g)(y) + E n "' /2 ^(y)} d y 

= o(n~^l 2 ), 

where <pj,v,g = itj S (—D)<f>v an d ^j,g(y) is a polynomial in y(£ R p ) whose 
coefficients are smooth functions of the partial derivatives of A (a) at a = 
up to order j + 2 and those of v a Q a hi at a = up to order j together with 
those of g at fi up to order j + 1. 

In the next theorem we consider p = l. 

Theorem 4. Under the same assumptions as in Theorem 2, assume 
g € C r {Q) for some r > 2. Then 

(4 24) 1 ~ PMS ^'^ s t} - -py/vs MWS) (i + o (-*■) ) 

and 

Pz,{(^n-n 7 )/^< -t} , ,3/ y — \ / ,/ / — \ f -\ , t 



exp(-t 3 /^)^-t/^)(l + o(-^=)) 



*(-*) 
(4.25) 

where <I>(t) is i/ie standard normal distribution, and ip(t) is a power series 
which converges for t sufficiently small in absolute value. 

Theorem 4 states the moderate deviations results for the distribution of 
an MIRFS, which will be used to prove Edgeworth expansion for the MLE in 
Section 5. Since the proof is a straightforward generalization of Theorem 6 
in [47], it will not be repeated here. 

5. Efficient likelihood estimation. For a given state space model defined 
in (2.1) which involves several parameters 9 = (9\, . . . , 9 q ), the estimation 
problem we consider in this section is the case of estimating one of the 
parameters at a time; the other parameters play the role of nuisance param- 
eters. The true parameter is denoted by 9q. Recall p n = p n (£,o,£,i, ■ ■ ■ , £n! 9) 
defined as (2.3). When dlogp n /d9 exists, one can seek solutions of the like- 
lihood equations 

(5.1) ^ = 0. 
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In the following, we denote E x as the expectation defined under P e (-,-) 
in (2.1) with initial state Xq = x, and E? x ^ as the expectation defined under 

P e (-, •) in (2.1) with initial state Xq = x,£q = s. The following conditions will 
be used throughout the rest of this paper. 

CI. For given # G 0, the Markov chain {(X n ,£ n ),n > 0} defined in (2.1) 
and (2.2) is aperiodic, irreducible, and satisfies (4.1) and (4.2) with weight 
function w(-). Assume <pg(x,y) < oo for all x,y £ X, and < sup x£X f(si; 
6\x,s ) < oo, for all s ,Si G R d . Denote ge(s ,£i) = sup^g^ / pg(x ,x{) x 
0\x±, so)m(dxi). Furthermore, we assume that there exists p > 1 as in 
Assumption K2 such that 

(5.2) sup /(lo, S o){lo g (^^,err^g^)}<0, 



(5.3) 



SU P E (x Q s )i fte0o,6)— 7 v 



< oo. 



C2. The true parameter 9q is an interior point of G. For all x G X, sq, s\ G 
H d , 9 G C R c/ , and for i,j,k = l,...,q, the partial derivatives 



3/(so;0|s) <9 2 /(s ;%) d 3 /(s ;%) 



as well as the partial derivatives 

<9/(si;f9|x,s ) d 2 f(si;9\x,s ) 



exist, 



86, 89 j 



89, 89 j d9 k 

d z f(si]6\x,s a 
89, 80 j 89 k 



and for all x,y G X , 9 — )• y) and # — > ire(x) have twice continuous deriva- 
tives in some neighborhood Ng(0o) := {9 :\6 — 6q\ < 5} of 6q. 
C3. 



sup 

X0£N s (e o ) 



8ir e (x) 



89i 



m(dx) < oo, 



and for all x G X, i,j = l 
dpe{x,y) 



/ sup 

JX8eN 5 {e ) 



89,, 



,q, 

m(dy) < oo, 



sup 

xeeN s (o ) 



sup 

X0eN s (O o ) 



8 2 ir e (x) 



89i 86, 



m(dx) < oo, 



8 2 p e (x,y) 



89i 89, 



m(dy) < oo. 



C4. For all x G X, s G K d and 9 G 9, 

3/(£o;%) 



(x,so) 



30* 

a/(6;g|a,g ) 

89i 



< oo, 



< oo, 



E 



d 2 f(Zo;9\x) 



E\ 



(x.so) 



89 i 89 j 

d 2 /(6;%,sp) 
89 i 86 j 



< oo, 



< oo. 
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Furthermore, we assume that, for all x G X , sq G R d and uniformly for # € 

Wo), 



<9 3 log/(£ ;%) 



dOi d6j d6 k 



< Hij k (x,£o), 



<9 3 log/(£i;0|x,s o ) 



d9id9jd9 k 



< G ijk ({x,s ),^i), 



where H ijk and are such that E%PH ijk (x, Co) < oo and E e ^^G ijk ((x, s ), 
£1) < oo, for all i,j,k = l,...,q and for all cc G X, sq G R d . 



C5. 



sup£#> 



sup sup ^PiM^ll^ljg) 



< oo. 



C6. The equality 

Pn(£o, 6, • • • , £n, 0) = Pn{£,0, £l, • • • , €n, <?') 

holds P- almost surely, for all nonnegative n, if and only if 6 = 9'. 

C7. For all x,y G X, 6 — > po(x,y), 9 — > itq(x) and 9 — > ip x (0), are con- 
tinuous, and 9 — > /(so; 9\x), as well as 9 — > f(s±;9\x, so), are continuous 
for all x G X and so> s i £ R d - Furthermore, for all x G ^ and So,si G R d , 
/(so; #|x) — >■ and /(si; so) — > 0, as |0| — >■ oo. 

C8. ^|log(/(eo^ok)/(6;^ok,Co))| <oo for all x G 
C9. For each G 0, there is 5 > such that, for all x £ X, 

E e x °( sup [io g (/(£ o ;0V)/(£i;0V,£o))] + ) <oo, 

V|0'-0|<(5 / 
where a + = max{a, 0}. And there is a 6 > such that, for all x E X, 

E e x °( sup [log(/(eo;^ , |x)/(ei;^V,eo))] + ') <°°- 

V|0'|>6 / 



Remark 2. (a) Condition CI is the w-uniform ergodicity condition for 
the underlying Markov chain, which is considerably weaker than the uni- 
formly recurrent condition Al of [39], and that of [14]. Furthermore, we 
impose conditions (5.2) and (5.3) to guarantee that the induced Markovian 
iterated random functions system satisfies Assumptions K2 and K3 in Sec- 
tion 4. 

(b) To have better understanding of these properties, we first consider a 
simple state space model X n = aX n _\ + e n , £ n = X n + r) n , where \a\ < 1, e n 
and r) n are i.i.d. standard normal random variables, and they are mutually 
independent. Since £ n are independent for given X n , the weight function w 
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depends on Xq only and we have w(x) = \x\ + 1. Note that X = R. Denote 
b = (1 - \a\ p )/(l - \a\). Observe that 

[°° expj-0/ - ax) 2 /2} exp{-(s - y) 2 /2} 

sup / = -== dy 

xe~RJ-oo v2ir \<2ir 



■ sup v exp{ — {ax — s) 2 /4} 



/oo ^ 
-=== exp{-(y - (ax + S )/2) 2 /2(l/2)} 
-oo -V/27T (1/2 



VMV2) 

— F^sup exp{-(ax - s) 2 /4} = — ^=. 

V27T xGR V47T 

A simple calculation leads to 

a p x + Yfk=l ak£ P-k\ + 1 



(x„, S „)6RxR 1 °' °> I V W{X ,S 



(5.4) 



< log sup E% 

Xq£R 



(4tt)p/ 2 (|x | + 1) 



\aPx \+E2 \ZV u k e p - k \ + l 
(4vr)p/2(|x | + l) 



< log sup 
io€R 

. J> p x | + 26/v^F+l\ n 

= log sup < — -n—. : — > < 0. 

xoGRl (4^/2(|x | + l) J 

This implies that (5.2) holds. By using the same argument, we see (5.3) 
holds. 

Next, we consider the case that e n and rj n are i.i.d. double exponential 1) 
random variables. Observe that 

f 00 exp{-|y - ax\} exp{-|s - y\} 

SUP/ 7=2 7=2 ^ 



oo 



= - sup((l + \ ax — s|) exp{— \ax — s|}) = — . 
4 xen 4 

By making use of the same argument as in (5.4), we see that (5.2) and 
(5.3) hold. The extension to £ n = /3x„Cn-l + Vn, studied in Remark 1(b), is 
straightforward and will not be repeated here. Other practical used models 
of the Markov-switching model, ARMA models, (G)ARCH models and SV 
models will be given in Section 6. 

(c) Note that the mean contraction property E log L\ < is not satisfied in 
the above examples. Instead of applying Theorem 1 directly, we will explore 
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the special structure of the likelihood function in Lemma 4 below, such 
that {((X n ,£ n ),M n ),n > 0} is an irreducible, aperiodic and Harris recurrent 
Markov chain. Hence, we can apply Theorem 1 for the Markovian iterated 
functions system on M induced from (2.4)— (2.7). 

(d) C2-C4 are standard smoothness conditions. C5 is the technical condi- 
tion for the existence of the Fisher information to be defined in (5.9) below. 
C8 and C9 are integrability conditions that will be used to prove strong 
consistency of the MLE. Condition C6 is the identifiability condition for 
state space models. That is, the family of mixtures of {/(£i;0|a:,£o) : 6 £ 0} 
is identifiable. This condition will be used to prove strong consistency of the 
MLE. Although it is difficult to check this condition in a general state space 
model, in many models of interest the parameter itself is identifiable only up 
to a permutation of states such as a finite state hidden Markov model with 
normal distributions. A sufficient condition for the identifiable issue can be 
found in Theorem 1 of [14] . See also the paper by ltd, Amari and Kobayashi 
[38] for necessary and sufficient conditions in the case that the state space 
is finite and £j is a deterministic function of Aj. 

(e) When the state space of the Markov chain {X n ,n > 0} is finite, and 
the observations £ n are conditionally independent, this reduces to the so- 
called hidden Markov model. It is easy to see that condition CI implies (Al) 
by choosing w(x) = 1, and conditions C2-C4 reduce to (A2), (A3) and (A5) 
of [7]. Conditions C6-C9 reduce to conditions C1-C6 in [44]. We will discuss 
condition C5 in Remark 3 after Lemma 5. 

Let {{X n , £ n ), n > 0} be the Markov chain defined in (2.1) and (2.2). 
Recall from (2.8) that the log likelihood can be written as 

1(6) = logp n (a, ...,£ n ;0) = log ||P 9 (e„) o • • • o p,(^) o P e (£oH| 

|Pe(en)o---op e (e 1 )op e (^ )7r|| 



(5.5) =log 



P0(£n-i)°-"°P0(£i)°P*(£o)7r| 
Pe(6)°Pe(£oH| 



+ --- + log 

ll p e(^o)vr|| 

For each n, denote 

(5.6) M n := P (£ n ) o • • • o P fl (&) o P e (£ ) 

as the Markovian iterated random functions system on M induced from 
(2.4)-(2.7). Then {((X n ,£ n ),M n ),n > 0} is a Markov chain on the state 
space (X x R d ) x M, with transition probability kernel Pg defined as in 
(3.2). Let Tig be the stationary distribution of {((X n ,£ n ),M n ),n > 0} defined 
in Theorem l(iii). Then the log-likelihood function 1(6) can be written as 
:= YIk=i 9(Mk-i,M k ) with 

|Pe(&)o---oP 9 (fi)oP fl (£o)7r|| 



(5.7) g(M k _ 1 ,M k ) := log ■ 



|P0(&-i)°"-°P0(£i)oP e (£o>r| 
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In order to apply Theorems 1-4, we need to check that the Markovian 
iterated random functions system satisfies Assumption K, and the induced 
Markov chain is aperiodic, irreducible and Harris recurrent. For this purpose, 
we need to define a suitable metric on the space M, which has been defined 
in (2.4). First, we add a further condition on M to have 



M = < h\h: X — > R + is m-measurable, / h(x)m(dx) < oo and sup h(x) < oo 

I J x£X 



For convenience of notation, we still use the notation M, and will use h to 
represent an element in M, which is different from the notation u used in 
Sections 3 and 4. We define the variation distance between any two elements 
hi , hi in M by 



Note that (M,gQ is a complete metric space with Borel cr-algebra Z3(M), 
but it is not separable. Thus, Theorems 1-4 do not apply. However, rather 
than deal with the measure-theoretic technicalities created by an insepa- 
rable space, we can apply the results developed in Section 7 of [13] for a 
direct argument of convergence. Therefore, Theorems 1-4 still hold under 
the regularity conditions. 

In order to describe our main results, we need the following lemmas first. 
Their proofs are given in Section 7. 

Lemma 3. Assume C1-C5 hold or CI, C6-C9 hold. Then for each 
9 £ and j = 1, . . . ,n, the random functions Pg(t;o) and ~Pg(£j), defined in 
(2.5) and (2.6), from (X x R rf ) xMtoM are Lipschitz continuous in the sec- 
ond argument, and the Markovian iterated random functions system (2.4)- 
(2.7) satisfies Assumption K. Furthermore, the function g defined in (5.7) 
belongs to C r (Q) for any r > 0. 

For each # G 0, recall that {((X n , £ n ), M n ), n > 0} is a Markov chain in- 
duced by the Markovian iterated random functions system (2.4)-(2.7) on 
the state space x R d ) x M. 

Lemma 4. Assume C1-C5 hold or CI, C6-C9 hold. Then for each 
9 £ 0, {((X n , £ n ), M n ), n > 0} is an aperiodic, (m x Q x Q) -irreducible and 
Harris recurrent Markov chain. 

Lemma 5. Assume C1-C5 hold. Then the Fisher information matrix 



(5.8) 



d{hi,h 2 ) = sup \h\{x) - hi{x)\. 

x&X 



I(0) = (Jy(0)) 



(5.9) 
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aiog||P (£l)°Pg(&)7r| 



is positive definite for 6 in a neighborhood Ns(9q) of 9q. Recall that E^ := En 
is defined as the expectation under Pn in (3.2). 



Remark 3. Note that the Fisher information (5.9) is defined as the 
expected value under the stationary distribution II s of the Markov chain 
{((X n ,^ n ), M n ),n > 0}. It is worth mentioning that only £ n appears in M n , 
in which it reflects the nature of state space models. 

When the state space X is finite, and the random variables £ n are 
conditionally independent for given X n , let H := H(£i, £q, . . .) = 



where 

?e o f 0\ogf{£ m ;9\X n 



E 



E 



09 

,o fd\ogf(U;0\x,, 



09 



tufa- 



+ E 6 



9 o( logpe (X m , X m+1 ) 



00 



E" 



e o f dlogpe(X m X m+1 ) 



oe 



£l>£0; • • ■ 
f0)£-l) • • 



Under their Assumptions 1-4, Bickel and Ritov [6] showed that H G C 2 (P e °) 
and defined I H (0°) := E e ° {HH 1 }. They also showed that 



lim -E d ° 

n— >oo n 



Olog ||r ra 7r|||g = go 
09 



Olog ||r ra 7r|||g = go 
09 



Ih(9°). 



In this paper we represent the log likelihood function of an additive func- 
tional of the Markov chain {((X n , £ n ), M n ), n > 0} in (5.7), and then apply 
the strong law of large numbers for Markovian iterated random functions 
given in Theorem l(iv) to have, with probability 1, 



2 



lim — - 

n->oo n 09j 09, 



■iog||Pe(60 



'Pe(£i)°Pfl(£oH = 



Hence, under Assumptions 1-4 of [6], 1(9) is well defined and is equal to 
Ih(9)- The moment condition in Assumption 4 of [6] can be relaxed to 
the following: there exists a S > with po(£) := su P\e-e°\<8 max x,y£X /(f ■%) > 
such that sup x£ x P e ° {po(£i) = cc\Xq = x} < 1; see [7]. 
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Lemma 6. Assume C1-C5 hold. Let lj(9 ) = dl(9)/d9 j \ e= g . Then, 



as n — > oo, 



(5.10) -j=(lj(0o))j=l,...,q — >N(0,1{6 )) in distribution. 

Theorem 5. Assume C1-C5 hold. Then there exists a sequence of so- 
lutions 9 n of (5.1) such that # n — >■ #o in probability. Furthermore, y/n(9 n — 
6>o) is asymptotically normally distributed with mean zero and variance- 
covariance matrix I (Oo). 

Since the proof of Theorem 5 follows a standard argument, we will not 
give it here. 

Corollary 2. Under the assumptions of Theorem 5, if the likelihood 
equation has a unique root for each n and all £i,...,£n> then there is a 
consistent sequence of estimators 9 n of the unknown parameters 9q. 

Next, we prove strong consistency of the MLE when the log likelihood 
function is integrable. A crucial step is to give an appropriate definition 
of the Kullback-Leibler information for state space models, so that we can 
apply Theorem 1 to have a standard argument of strong consistency for the 
MLE. Here, we define the Kullback-Leibler information as 



^o^) = E^(log. l|Pe( ' (ei)oP9o(e ° ) ^' 1 



,Pfl(6)°P«(£o)7re 

(5.11) 

fi ll p eo(6)°Peo(Co)vre || , , . . 

J \\r0{€l)or e {€o)no\\ 

Theorem 6. Assume that CI, C6-C9 hold and let 9 n be the MLE based 
on n observations £q,£i, ... ,£ n . Then 6 n — >9oP e °-a.s. asn— >oo. 

Since the proof of Theorem 6 follows a standard argument, we will not 
give it here. 

To derive the Edgeworth expansion for the MLE, we need to define the 
following notation and assumptions first. For nonnegative integral vectors 
v = (z/ 1 ),...,^)^ wr ite \ u \ = i/ 1 ) + ••• + v\ = and let 

D u = (D\) uW ■ ■ ■ (D q ) uM denote the z^th derivative with respect to 6. Sup- 
pose assumptions C2, C3, C4 and C5 are strengthened so that there ex- 
ists r > 3, as follows. 

C2'. The true parameter #o is an interior point of 0. For all x 6 X, sq,s\ £ 
R d , 9 G C R 9 , the partial derivatives 

D7(*o; 9\x), D 2 f(s ; 9\x), . . . , D r f(s ; 9\x), 
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as well as the partial derivatives 

D 1 f(s 1 ;0\x,s ), D 2 f(s 1 ;9\x,s ),...,D r f(s 1 ;9\x,s ), 

and for all x, y G X, 9 — > po(x, y) and 9 — > ttq(x) have r — 1 continuous deriva- 
tives in some neighborhood Ns(9q) := {9 : \9 — 9q\ < 5} of 9q. 
C3'. 



/ sup \D 1 7Tg(x)\m(dx) < oo, . . . , / sup \D r 1 

JX0eN s (8 o ) Jx0£N s (9 o ) 



7Tg(x)\m(dx) < oo, 



and for all x G X , 



sup |.D pe(a;, y)\m(dy) < oo, . . . , / sup \D r 1 po (x, y)\m(dy) < oo. 
xeeN s (8 ) JX9eN s (e ) 

C4'. For all x G Af , s G R d and G 6, 

£f|D<7(£ ;%)r < oo, ^ )So) 1^/(6; S0 )| r <oo, 

for 1 < |^| < r, and 



E»( sup |Z)"/(eo;e|x)r ) <oo, 
>0eN s {O o ) 



El so) ( sup 0k,s o )r 

' \eeN s (o ) 



for = r + 1. 
C5'. 



P e f /(ft; %,&) Y , 

sup £7 sup sup — — <oo. 
xgA" \\9-9 \<Sy,zex J{!;o',V\z)J{c,i-,v\z,t,o) J 

We will assume conditions (4.16) and (4.17) hold for zf ] := D v logpi(£ 0) 

£i;#o)) 1 < |f| < r. Let := {-Zj : 1 < M < r} be p-dimensional random 
vectors for j > 1, where p is the number of all distinct multi- indices 
l<\u\< r. In the following, denote Z = (1/n) Y^=i %k- 

Use a standard argument involving the sign change of a continuous func- 
tion, or a fixed point theorem in the multi-parameter case (cf. [4]), to prove 
that the likelihood equation has a solution which converges in probability to 
#0- Note that the following notation is interpreted in the multi-dimensional 
sense. Applying the moderate deviation result on Z in Theorem 4, it is 
possible to ensure that, with P e ° -probability 1 — o(n~ 1 ), 9 n satisfies the 
likelihood equation and lies on (9q ± logn/ \/n). It is this solution we take 
as our 9 n . If the likelihood equation has multiple roots, assume we have 
a consistent estimator T n such that T n lies in (9$ ±\ogn/ ^/n) with P e °- 
probability 1 — o(n~ 1 ). In this case, we may take the solution nearest to 
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T n . By the preceding reasoning, this solution, which is identifiable from the 
sample, will lie in (9q ± log n/\Jn) with P e ° -probability 1 — o(n~ 1 ). 
Clearly, with 9 n as above, with probability 1 — o(n~ 1 ), 

'-I i 

(5.12) = Z^+^2-Z^\9 n -e r + R ntS (9 n ), l< s <q, 

\v\ = \ V - 

where e s has 1 as the sth coordinate and zeros otherwise. 
We rewrite equation (5.12) as 

(5.13) = A(Z,9 n ) + R n . 

Note = A(^(9q),9q) and %j\i(o ),o = —(Fisher information) ^ 0. 

Hence, by the implicit function theorem, there are a neighborhood N of 
7 and q uniquely defined real-valued infinitely differentiable functions gj 
(1 < i < q) on N such that 9 = g(z) = (gi(z), . . . , g q (z)) satisfies (5.13). This 
implies, with probability 1 — o(n~ 1 ), \9 n — 0q\ < K(logn/y/n) . 

To derive the asymptotic expansion of P e °{^fn{9 n — 9q) G B}, note that 
9 n = g(n _1 Z n ), where g:R p — > R 9 is sufficiently smooth in some neighbor- 
hood of 7. For the case of i.i.d. £ n , Bhattacharya and Ghosh [4] made use of 
the Edgeworth expansion of the distribution of (S n — nj) / y/n to derive an 
Edgeworth expansion of the distribution of - v /n{g(?7,~ 1 5'„) — g(7)}. Making 
use of Theorem 4 and a straightforward extension of their argument, we can 
generalize their result to have the following theorem. 



Theorem 7. Assume CI, C2'-C5' hold for some r > 3. Assume (4.16) 
and (4.17) hold. Let J s = (-Djgi(7))i<i< g ,i<j<p be the qxp Jacobian matrix 
and let V(g) = J s VJg. Then there exists a sequence of solutions 9 n of (5.1), 
and there exist polynomials pj in q variables (1 < j < r — 2) such that 



sup 

BeBa.c 



{r—2 
Mg)(y)+E n " j/2 te 



(y) > dy 



o{n 



-(r-2)/2^ 



where <frjy, s = TTj^{—D)(j)y and njg(y) is a polynomial in y(e R p ) whose 
coefficients are smooth functions of the partial derivatives of A (a) at a = 
up to order j + 2, and those of y a Q, a h\ at a = up to order j together with 
those of g at fx up to order j + 1. 



The application of Theorem 7 to third-order efficiency for the MLE and 
third-order efficient approximate solution of the likelihood equation follows 
directly from [28]. 
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6. Examples. From a theoretical point of view, Theorems 5-7 are ad- 
equate for state space model estimation problems in providing assurance 
of the existence of efficient estimators, characterizing them as solutions of 
likelihood equations and prescribing their asymptotic behavior. In practice, 
however, one must still contend with certain statistical and numerical dif- 
ficulties, such as implementation of the maximum likelihood estimator. In 
this section we apply our results to study some examples which include 
Markov switching models ARMA models, (G)ARCH models and SV mod- 
els. For simplicity, in these examples we consider only specific structure of 
normal error assumption in most cases. Although strong consistency and 
asymptotic normality of the MLE in ARMA and GARCH(p, q) have been 
known in the literature, we provide alternative proofs in the framework of 
state space models. Furthermore, we can apply Theorem 7 to have Edge- 
worth expansion for the MLE. To the best of our knowledge, the asymptotic 
normality of the MLE in the AR(1)/ARCH(1) model, considered in Sec- 
tion 6.3, seems to be new. The results of asymptotic properties for the MLE 
in stochastic volatility models not only provide theoretical justification, but 
also give some insight into the structure of the likelihood function, which 
can be used for further study. 

6.1. Markov switching models. We start with a simple real-valued fourth- 
order autoregression around one of two constants, m or \i2'- 

4 

(6-1) in~^X n =^2<Pk(€n-k- VX n _ k )+Sn, 

k=l 

where iV(0,cr 2 ), and {X n ,n > 0} is a two-state Markov chain. This 
model was studied by Hamilton [33] in order to analyze the behavior of U.S. 
real GNP. To apply our theory in the form of (6.1), we consider a simple case 
of order 1 in (6.1). In this case, the likelihood function for given X n = x n , 
n > 0, is 

(6.2) f(£ n \x n ;0) = -=^exp(-[(€n- l*x n ) - <pi(£ n -i - ^™-i)] 2 /( 2 °" 2 ))- 

Denote by \p X y]x,y=i,2 the transition probability of the underlying Markov 
chain {X n ,n > 0} and let 9 = (pu,P2i, <pi, M2> °~ 2 ) be the unknown pa- 
rameter. Assume that \ip\ \ < 1, and that there exists a constant c > such 
that <t 2 > c. Moreover, we assume that fj,\ ^ fj,2 such that the identifiabil- 
ity condition C6 holds. Since the state space of X n is finite, we consider 
< p X y < 1 for all x, y = 1, 2, and let w(x) = \x\ + 1 such that the condition 
CI holds. Under the normal distribution assumption, it is easy to see that 
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conditions C2-C4 and C7-C9 are satisfied in this model. To check that C5 
holds note that condition C5 reduces to 

2s 



(6.3) 



sup E®, ° 



sup max 



/(£o;%)/(£i;%,6 



e - e o l<s y,zeX /(f o; 9\z)f(Ci;0\z,Co) 



< oo. 



Since the maximum over x, y and z is applied to a finite set X , and / defined 
in (6.1) is a normal density, it is easy to check that (6.3) is satisfied. 

When £ n = X n as in (6.1), that is, ^1 = ^2 = /^ are given, this reduces to 
the classical autoregressive model with unknown parameters 6 = (tpi, . . . , (^4, a 2 ). 
The Fisher information matrix is then given by 



2(a 4 )- 



where T = (74-7)4x4 for 1 < i,j < 4 with 7^ = EXnX^^. A simple calcu- 
lation shows that (5.9) reduces to (6.4) in this case. When ip^ = as in 
(6.1), this is the hidden Markov model with normal mixture distributions 
considered in Example 1 of [7]. 

6.2. ARMA models. We start with a univariate Gaussian causal ARMA(p, 
model which can be written as a state space model by defining r = max{p, q + 

1}, 



(6.5) 



- p, = Cl!l(£n-1 - H) + «2(£n-2 — At) H V a r {i n - r 

+ £n + /3l£n-l + H + l£n-r+l, 



M) 



where ay 



for j > p and /?j = for j > q. Furthermore, we assume e n are 
i.i.d. random variables with distribution iV(0,cT 2 ). Asymptotic properties of 
the MLE in the ARMA model can be found in [35] and [53]. A general 
treatment of the MLE in the Gaussian ARMAX model can be found in 
Chapter 7 of [11]. 

By using the same idea as that in [34], we consider the following state 
space representation of (6.5): 









OL2 ■ 




a r 










1 


• 













(6.6) 


A^n+l — 





1 ■ 








x n + 









. 


• 


1 


. 








and 
(6.7) 



£ n = fl+[iP 1 f32---Pr-l}X n . 



Assume that the roots of 1 — a\z — a^ 2 



a n z p = lie outside the 



unit circle. It is easy to see that {X n ,n > 0} forms a w-uniformly ergodic 
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Markov chain with w(x) = \\x\\ 2 (cf. Theorem 16.5.1 in [46]). And are 
conditionally independent given {X n ,n > 0}. Since the verification of the 
weighted mean contraction property and the weighted moment assumption 
is the same as those in Remark 2(b), it will not be repeated here. This implies 
that condition CI holds. The assumption e n ~ N(0, a 2 ) also implies that con- 
ditions C2-C5, C2'-C5' and C7-C9 are satisfied in model (6.5). Since the 
verification is straightforward, we do not report it here. Suppose the condi- 
tional distribution of £ n given Xq, . . . , X n is of the form Fx n _ lt x n from (6.7). 

The Cramer conditions (4.16) and (4.17) hold for zf := D u logpi(f , fi;0 o ), 
since the conditional density of ^ n given {x n ,n>0} is iV(0,o- 2 ) and 



lim sup 

|0|->O 



<1, 



where ip(-) is the normal density function of e±, and ir is the stationary dis- 
tribution of {X n }. The identification issue in C6 can be found in Chapter 9 
of [11] or Chapter 13 of [34]. 

6.3. (G)ARCH models. In this subsection we study two specific (G)ARCH 
models. To start with, we consider the AR(1)/ARCH(1) model 



(6.9) X n = /3 + Pi X n _i + yj a + aiX^_ x e n , 

where on, Pi are unknown parameters for i = 0, 1 with «o > 0,0 < a± < 1, 
3af < 1 and < Pi < 1. Here £ n are i.i.d. random variables with the stan- 
dard normal distribution. Note that in (6.9) X = (X n ) is defined as the 



autoregressive scheme AR(1) with ARCH(l) noise (yj a + aiX*_ 

When Pq = Pi = 0, this is the classical ARCH(l) model first considered by 
Engle [19]. 

Model (6.9) is conditionally Gaussian, and therefore the likelihood func- 
tion of the parameter 6 = (cko, ai, Pq, Pi) for given observations x = (xq = 
0, x%, ■ ■ ■ , x n ) from (6.9) is 

n 

*(x;0) = (27T)-"'/ 2 n(«o + a 1 xl„ 1 )~ 1 / 2 

(6.10) 



fc=l 



j 1 ^ (x k - p - PlX k -i) 



2 



2 £ri a + aix|_ 1 J 

Assume Pq = and ao,ai are given. The maximum likelihood estimator 
Pi of Pi is the root of the equation dl(x;9)/dPi = 0. In view of (6.9) and 
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(6.10), we obtain 



(6.11) 



Y2=i( x k - A))£fc-i/(ao + «i^Li) 



ELizLi/(«o + ai4-i) 



Meyn and Tweedie [46], pages 380 and 383, establish w-uniform ergod- 
icity [with w(x) = \x\ + 1] of the AR(1) model X n = /?o + fiiX n _\ + e n by 
proving that a drift condition is satisfied, where \ < 1 and the e n are i.i.d. 
random variables, with -E|e n | < oo, whose common density function q with 
respect to Lebesgue measure is positive everywhere. The strongly nonlattice 
condition holds as that in model (6.5). By using an argument similar to 
Theorem 1 of [45], we have the asymptotic identifiability of the likelihood 
function (6.10). Letting £ n = X n , and using an argument similar to that in 
Remark 2(b), condition CI holds. The verification of conditions C2-C9 and 
C2'-C5' is straightforward and tedious, and is thus omitted. By Theorems 
5-7, we have the strong consistency, asymptotic normality and Edgeworth 
expansion of the MLE The asymptotic properties of the MLE of /?o, ctQ 
and a\ can be verified in a similar way. 

Next, we consider the GARCH(p,g) model of (1.1) in Example 1. It is 
known that the necessary and sufficient condition for (1.1) defining a unique 
strictly stationary process {Y n ,n > 0} with EY^ < oo is 



We assume (6.12) holds. 

Similar to the estimation for ARMA models, the most frequently used 
estimators for GARCH models are those derived from a (conditional) Gaus- 
sian likelihood function (cf. [20]). Without the normal assumption of e n 
in (1.1), and imposing the moment condition E(e\) < oo, Hall and Yao 
[32] established the asymptotic normality of the conditional maximum like- 
lihood estimator in GARCH (p, q). They also established asymptotic re- 
sults when the case of the error distribution is heavy-tailed. Earlier in 
the literature, when p = q = 1, Lee and Hansen [43] and Lumsdaine [45] 
proved, under some regularity conditions, the consistency and asymptotic 
normality for the quasi-maximum likelihood estimator in the GARCH(1,1) 
model. 

By using the state space representation (1.2) and (1.3), it is known (cf. 
Theorem 3.2 of [1]) that the Markov chain {X n ,n > 0} defined in (1.3) is 
stationary if and only if the top Lyapunov exponent 7 of A n is strictly 



(6.12) 




i=i 
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negative. It is easy to see that {X n ,n > 0} is an aperiodic, irreducible and 
u>-uniformry [with w(x) = \\x\\ 2 ] ergodic Markov chain. Furthermore, we as- 
sume e n are i.i.d. random variables with distribution N(0,a 2 ). An argument 
similar to that in Remark 2(b) leads to condition CI holding. The normal 
error assumption also implies that conditions C2-C5, C2'-C5' and C7-C9 
are satisfied in model (1.3). When p = q = 1, Theorem 1 of [45] proves the 
asymptotic identifiability of the likelihood function. 

6.4. Stochastic volatility models. Consider the stochastic volatility model 
(1.4)-(1.8). To check that condition CI holds, we note that vu(x) = \x\ + 1 in 
the AR(1) model X n = aX n _\ + r\ n by proving that a drift condition is sat- 
isfied, where |q| < 1 and the r) n are i.i.d. random variables, with -E7|i7i| < oo, 
whose common density function q with respect to Lebesgue measure is posi- 
tive everywhere. Since e n ~ A r (0, 1), Cn = loge 2 , r\ n ~ iV(0, cr 2 ), and Cn and r\ n 
are mutually independent, an argument similar to that in Remark 2(b) leads 
to the result that the rest of condition CI holds. Conditions C2-C5, C2'- 
C5' and C7-C9 are also satisfied in model (1.5) and (1.6) (cf. pages 22-23 
of [50]). Denote £ n := log Y 2 . Note that the conditional density of X n exists, 
and this implies that the conditional distribution of £ n given Xq, . . . ,X n is 
of the form Fx n _ lt x n such that 

(6.13) limsup 

|*| — 

where <£>(•) is the normal density function of Ci an d vr is the stationary 
distribution of {X n }. Let S n = Yli=i£h — 0. Then {(X n , S n ),n > 0} is 
strongly nonlattice. To check the identification condition C6, the reader is 
referred to Chapter 13 of [34] and Section 2.4.3 of [29]. 

Next, we assume that e n ~ A r (0, 1), Q n = logs 2 and r\ n is a sequence of 
i.i.d. double exponential(l) random variables. Furthermore, we assume Cn 
and r\ n are mutually independent. By using an argument similar to that in 
Remark 2(b), condition CI holds. Simple calculations also lead conditions 
C2-C5, C2'-C5' and C7-C9 to hold in this case. Under the assumption that 
the conditional distribution of £ n given Xq, . . . ,X n is of the form Fx n _ 1: x n 
such that (6.13) holds, {(X n , S n ),n > 0} is strongly nonlattice. 

Without the normal assumption, quasi-maximum likelihood (QML) es- 
timators of the parameters are obtained by treating ( n and r\ n as though 
they were normal and maximizing the prediction error decomposition form 
of the likelihood obtained via the Kalman filter or implied volatility. That is, 
we assume that ( n is a sequence of independent and identically distributed 
./V(0,ct 2 ) random variables. For given observations y = (logy 2 , . . . ,logy 2 ) 
from (1.5) and (1.6), the likelihood function of the parameter 6 = (a,cr 2 ,(T?) 



jits 



dF T . 



(s) >(p(z) dz ir(dx) 



<1, 
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is 



l{y,0)= I - I 7r(x )(27ra c 2 )- 

J xqEX J x n £X 

n 

(6.14) x JJp(sfc_i,x fc ) 



k=l 

S 1 V- {\ogyl-uj-x k ) 2 \ 

x exp< — - y g ( "^o • • • dx n , 

{ 2 k=i a ( J 

where p(xk—i,xj-) is defined in (1.7). By using the results of [16], Harvey, 
Ruiz and Shephard [36] showed that the quasi-maximum likelihood estima- 
tors are asymptotically normal under some regularity conditions. Further 
study of the MLE in stochastic volatility models will be published in a sep- 
arate paper. 

7. Proofs of Lemmas 3-6. For convenience of notation, denote {Z n ,n> 
0} := {((X n ,£ n ),M n ),n > 0} as the Markov chain induced by the Markovian 
iterated random functions system (2.4)— (2.7) on the state space (X x R rf ) x 
M. In the proof of Lemma 3, we omit 9 in Pg(-) for simplicity. 

Proof of Lemma 3. We consider only the cases of P(£i), since the 
cases of P(£o) an d P(£j) 5 for j = 2, . . . , n, are a straightforward consequence. 
For any two elements h\, h 2 € M, and two fixed elements so, s\ € R rf , by (5.8) 
we have 

d(P(*i)/li,P(si)/l2) 



p e (x ,x 1 )f(si;9\xi,s )h 1 (x 1 )m(dx 1 ) 
Pe(x , x 1 )f(si;6\xi,so)h 2 (xi)m(dxi) 



sup 

XQ&X 



<d(hi,h 2 ) sup / p e (x ,x 1 )f(s 1 ;e\xi,s )m(dxi) 

xq£X t 



<C[ sup p e (x ,xi)m(dxi) ) d(hi,h 2 ), 
\xoEXJ J 

where < C = sup x eX f(si;9\x\, sq) < oo by assumption CI is a constant. 
Note that sup^g^ f pg(xo,xi)m(dxi) = 1. The equality holds only if h\ = 
h 2 m-almost surely. This proves the Lipschitz continuous condition in the 
second argument. 
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Note that CI implies Assumption Kl holds. Recall that M n = P(£ n ) o • • • o 
P(£i) oP(£ ) in (5.6). To prove the weighted mean contraction property K2, 
we observe that, for p > 1, 



sup E (x0jS0 J log I L p 

xo,so { \ 



W(X ,S ) 



= sup E (xo>so J log( sup 
(7.1) < sup E (a . 0iS0 Jlog( sup / 



d(M p h 1 ,Mph 2 )w(Xp,t p ] 



d(h 1 ,h 2 ) w(x ,s ) 



sup / pe{xo,xi)f{£i]0\xi,so)m{dxi) 

x w(Xp,Q 
w(x ,s ) 



<0. 



The last inequality follows from (5.2) in condition CI. 

To verify that Assumption K3 holds, as m is cr-finite, we have X 
U^Li where the X n are pairwise disjoint and < m(X n ) < oo. Set 



(7.2) 



h(x) = J2 



^Wm{X n \ 



It is easy to see that J x&x h(x)m(dx) = 1 and, hence, belongs to M. Observe 
that 



(7.3) 



Bd 2 (P(^)h,h) 
= E sup 



x f{Cj;0\xj,^j-i)h(x j )m(dx j ) - h(xj^i) 



By definition of h{x) in (7.2), it is piecewise constant, and pg(xj-i,Xj)f(^j; 
ip Xj {0)\^j-\) is a probability density function integrable over the subset X n . 
These imply (7.3) is finite. 
Finally, we observe 



= supE (a . 0)S0 J sup 

so, so Khi^h 



d(P(£i)/ii,P(£i)/i2)w(*i,£i) 



d(h 1 ,h 2 ) w(x ,s ) 
< sup E( XOiSQ )i sup p d (x ,x 1 )f(^ 1 ;e\x 1 ,s )m(dx 1 ) 

^0, «0 lx <=X J 



w(x ,s ) 



< oo. 
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The last inequality follows from (5.3) in condition CI. 

Note that C5 implies the exponential moment condition of g. Hence, the 
proof is complete. □ 

In the proof of Lemma 4 we omit 9 for simplicity. 

Proof of Lemma 4. We first prove that {Z n ,n> 0} is Harris re- 
current. Note that the transition probability kernel of the Markov chain 
{(X n ,£ n ),n > 0}, defined in (2.1) and (2.2), has a probability density with 
respect to m x Q. And the iterated random functions system, defined in 
(2.4)-(2.7), also has a probability density with respect to Q. By making use 
the definition (3.2), there exists a measurable function g: (X x R rf x M) x 
(X x K d x M) ->• [0, oo) such that 

(7.4) P(z, dz') = g(z, z')(m xQx Q){dz'), 

where J(^ xR d) xM g(z, z')(m xQx Q)(dz') = 1 for all z G (X x R d ) x M. For 
simplicity of notation, we let A(-) := (m x Q x Q){-) in the proof. For given 
n > 1, let P n (z, •) := P z (Z n G •) for z G (X x R d ) x M. For A G B(X x K d ) 
and BGB(M), define 

A n (A x B) := [ P z '{Z n £Ax B}A(dz'). 

Then for all A G B(X x R d ) and B G B(M), 

pn+l 



(z,AxB)= I P n (z',Ax B)g{z,z')A{dz') 

J(Xx~R d )xM 

= [ P z ,{Z n eAxB}g(z,z')A(dz'). 

J(Xxn d )xM 



It is easy to see that, for given any n > 1, the family (P n+1 (z, •))zG(A'xR d )xM 
is absolutely continuous with respect to A n . Therefore, by the Radon- 
Nikodym theorem, P n has a probability density with respect to A n for all 
n > 1 . Let g n be such that 

(7.5) P n+ \z,dz')=g n (z,z')A n (dz'), z G (X x R d ) x M, 
where f {Xx - Rd)xM g n (z,z')A n (dz') = 1 for all z G (X x K d ) x M. Note that 
gi = g. It is easy to check that all A n are absolutely continuous with respect 
to n. 

Denote B c as the complement of B. Since H(((X x R d ) x M) c ) = 0, also 
A(((X x R d ) x M) c ) = 0. Recall g is defined in (7.4). It is obvious from the 
previous considerations that we can choose 5 > sufficiently small such that 



/ / / 

J(XxR d )xM J{XxR d )xM J(XxR d )xM 



X 



t {g >8}{z2,Z3)A(dz 3 )A 2 (dz 2 )'n.(dz 1 ) > 0. 
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Hence, by Lemma 4.3 of [48], there exist a LT-positive set TiC (X x R d ) x M 
and a A-positive set T 2 C (X x H d ) x M such that 

a:= inf A 2 {z 2 £ (X x R d ) x M: g 2 ( Zl , z 2 ) > 5, g(z 2 , z 3 ) > 5} > 0. 
2i6ri,2 3 gr 2 

A combination of the above result with (7.4) and (7.5) implies 
P 3 ( Zl ,AxB) = [ P(z 2 ,Ax B)P\ Zl ,dz 2 ) 

J(XxR d )xM 

(7.6) > f 92{z 1 ,z 2 ) [ g(z 2 ,z 3 )A(dz 3 )A 2 (dz 2 ) 

J(XxR d )xM J(AxB)r\Vi 



>a5 2 A((AxB)nT 2 



l(XxR d )xM J{AxB)nT 2 

for all zi G T x and A x B G B{(X x R d ) x M). Therefore, we obtain an ab- 
sorbing set such that T\ is a regeneration set for {Z n ,n > 0} on (X x TL d ) x 
M, that is, T\ is recurrent and satisfies a minorization condition, namely, 

(7.6) . This proves the Harris recurrence of {Z n ,n > 0} on {X x TL d ) x M. 
Since {Z n ,n > 0} possesses a stationary distribution, it is clearly positive 
Harris recurrent. 

Next, we give the proof of aperiodicity. If {Z n ,n > 0} were (/-periodic 
with cyclic classes T±, . . . ,T q , say, then the (/-skeleton (Z nq ) n >o would have 

stationary distributions n74^X ^ or k = 1, . . . ,q. On the other hand, Z qn 
is aperiodic by definition, and M nq is also a Markovian iterated random 
functions system of Lipschitz maps, satisfying condition CI, and thus pos- 
sesses only one stationary distribution. Consequently, q = 1 and {Z n ,n > 0} 
is aperiodic. Since the Markov chain {((X n , £ n ), M n ), n > 0} has a proba- 
bility density with respect to A, it is obviously A-irreducible. The proof is 
complete. □ 

Proof OF Lemma 5. In order to define the Fisher information (5.9), we 
need to verify that there exists a 5 > 0, such that Slog ||P0(£i) o Pg(^ )vr||/9^ G 
L 2 {P^) for 9 G Ns(6q), a ^-neighborhood of 9q. That is, we need to show 

(7.7) ^(**(^!*Hj ! <ODi 

for«eJV { (« ). 

It is easy to see that C5 implies that 

/Slog / x ir(x)p(x, j/)/(£o; %)/(&; %, £o)m(dy) \ 2 

m ) <0 ° 

for 9 G Ns(9 ). And this leads to 



xex 



dl °s Lex ir( x )p( x , y)f(€o; Q\ x )f(£,i;Q\y, io)m{dy) 



(7.8) su P E»( 0U • — — - - ) <oo 



2 
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for 9 € Ns(9q), where E^ is the expectation under P (•, •). 
Finally, (7.8) implies (7.7) and we have the proof. □ 



Proof of Lemma 6. For each j = 1,. . . ,q, 
1 d 



U^o) 
n J 



n 86. 



■log||Pe(& 



^Xsioo . 1o: 
v k=i v 3 



Pfl(£l)°Pfl(&)7T| 



-~e 



|Pe(6) 



P e (£i)°P e (£oH 



|Pe(^-i)o"-oPfl(6)°Pe(6H 



1 n 
v fe=i 3 



=00 



Now, for each h G M, a = (ai, . . . , a q ) E C q , and a [X x R d ) x M mea- 
surable function cp with H^Hioft < oo, define 



(T!(a)^)((x, S ),/i) 



E 



(x,s) 



(7.9) 



exp iai,...,aq 



001 



log||P fl (£i)oP fl (£ )/i| 



Aiog||p (^)op (e o )/i| 



=00 



x V ((Xi,ei),p fl (ei)op fl (^)/»(x))j. 

By using an argument similar to that of Lemma 2, we have, for sufficiently 
small \a\, Ti(a) is a bounded and analytic operator. Let A^ (a) be the 
eigenvalue of Ti(a) corresponding to a one-dimensional eigenspace. Define 
jj as that in Lemma 2(v). By conditions C1-C5 and Lemma 4, it is easy to 
see that 



(7.10) 7i 



d 

da-i 



A^ (a) = Eft (A log ||P,(6) o P e (e )vr| 



e=e 



0. 



By Corollary 1, we have 



1 



(7.11) -7=(ij-( o))j =1 9 — y iY (°' S ( o)) in distribution, 



where the variance-covariance matrix 



(7.12) 



E(0 o ) = (£y(0 o )) 



9 2 A^(a) 



0=0/ ij = l,...,q 
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In the following, we will verify that the variance-covariance matrix S(#o) 
defined as (7.12) is the Fisher information matrix I(#o)- By Lemma 2 and 
Corollary 1, we have 



d_ 



o 



log ||M n 7r| 



_d_ 



log ||M n 7r| 



d 2 



n 



daj dak 



A* (a) 



o=0 



as n 



oo. Therefore, 

d 2 



daj da, 
lim -E: 

n— too n 



-A* (a) 



o=0 



J-log||M„vr| 



e=e 



^log||M„vr| 



e=e 



lim ■ 

n— >oo 



-E 



9o 



n 



d 2 



■log||M„7r| 



dOj dO k 



dOjdOk 

io g ||p e (ei)op e (e )^i 



0=00 



(^-log||P e (6)°P,(£oH 



d_ 
dK 



io g ||p e (ei)op e (e )^i 



--0o 



--0o 



□ 



APPENDIX 

Proofs of Lemma 1 and Theorem 2. In the following proofs we will use 
the same notation as in Sections 3 and 4 unless specified. Without loss of 
generality, in this section we consider the case Mq = Id, the identity, and the 
transition probability P of the Markov chain {(Y n , M n ),n > 0} depends on 
the initial state Yq = y only. Denote it as P y , and let E y be the corresponding 
expectation. To prove Lemma 1, we need the following lemma first. 



Lemma A.l. Let {(Y n ,M n ),n > 0} be the MIRFS of Lipschitz functions 
defined in (2.1) satisfying Assumption K. There exists < 6~q < 1 such that, 
for all <5 < <5q, there exist K > 0, and < r] < 1, so that 



sup E y 

y 



d(MZ,MZ)w{Y n ) 
d(u,v) w(y) 



< Krf 1 , for N and u, v G M. 
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Proof. For given < 5 < 1, and y & y, denote 

6- 



Cn(y) = sup< E 



/ d(K,M>(y B ) y 

V d(u,v) u>(y) y 



and let r/ n — sup{c n (y), y € 3^}- Denote u m — and v m — M v m . Let J" m be 
the cr-algebra generated by {(Y^, Mf.), <k< m}. Then 



E, 



fd(M% +m ,M% + Jw(Y n+m ^ 



d(u, v) 



w(y) 



E, 



d(F n:m (M^),F n:m (M^)) w(Y n+m ) 



d(u, v) 



J~ rr 



fd(M-,M^)w(Y n 



< 



{ 



d(u, 


v) 


w(y) 


d(M^ 


M v m ) 


w(Y m ) 


d(u, 


v) 


w(y) 


d(M«, 


M v m ) 


w(Y m ) 


d(u, 


v) 


w(y) 



E, 



w(y) 

d(F n:rn (ii m ) j F n:m (v m )') w(Y n + mj 



d(u m ,v n 



w(Y ni 



T 

J 77 



E 



^ IT, 



/ d(M^,M^)w(Y n+m ) \ S 
V d(u m ,v m ) w(Y m ) J 

\(V\<r, f d(M-,M ^)w(Y m y 
V d{u,v) 



w(y) 



This implies that 



E, 



d(M% +m ,M% +m ) w (Y n+m ] 



d(u, v) 



w{y) 



( djM^M^ ) w(Y r 
V \\ d(u,v) 



w{y) 



or i] n+m <r] n r] m . Therefore, 

(A.l) lim r]]l n = ini{r]j/ n , n € AT}. 



It is known by Assumption K2 that there exist p > 1 and d > such 
that S u P2/ E,{log(^P^)} < -d < 0. Along with sup.E,^} < 

00 by (4.2) and sup^ E y { ^^J^ 1 - } < 00 by Assumption K3, we have 



?7 P < supEj, 



p W(Y p ) 
w(y) 



sup E y <j exp ( 5G P + 5 log 



w(y) 



< oo, 



where G p = p log l(Fi). 

Since e" < 1 + y + y 2 e^l/2, we have, for y £ y,u,v £ M, 

•d(M-,M;)^(y p )^ 



E, 



d(u,v) w(y) 



< l + 5E y { lo 
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d(u,v) w(y) 



+ 5 2 EJ G b + 1o, 



w{y) 



exp ( 5G P + 5 log 



For u, v £ M, we have 



r/p < 1 - d£ + <r supE^ G p + lo, 



w{y) 



exp I <5G P + 5 log 



w(Y p ) 



w(y) 



Therefore, we can choose 5q > small enough so that rj p < 1. Along with 
(A.l), we obtain the proof. □ 

Proof of Lemma 1. For given if G H, y & y, and u,v £ M, if m < ra, 
we have, for < <5 < <5o < 1, 

\T n cp(y,u)-E yl p(Y n ,F n:m (v))\/w(y) 

= \E y ip(Y n ,M%) -E y <p(Y n ,F n:m (v))\/w(y) 

< \\ip\\ h E y {d(M%,F n:m (v)) S w{Y n ) 5 }/w(y) 



d(F n:m (M^_ m ),F n:m (v)) 



w{Y n ) 
w{Y n _ n 



< \\ip\\ h E y \ sup E Y , 



< H^llh.Ey^ sup Ey n _ 



w(Y n _ m ) 



w(Y n . 



w(Y n . 



w(y) 



/ rf(M£,M£) w(Y n ) 

—m J 



Si 



w(y) 

w(Y n _„ 
w{y) 



Note that in the last inequality we use d(u, v) < 1 and w(y) > 1 for all y £ 3^- 
By making use of Lemma A.l, and sup ye y E y [w(Yi)/w(y)] < oo in (4.2), 
there exist K > and < r/ < 1 such that 

(A.2) |T>(y,n) - E y tp(Y n , F n:m (v))\/w(y) < \\<p\\ h Kr, m < IMUW*- 

Denote h{y) = E y (p(Y m , F m (v)). Then by assumption (4.1), there exist 
7 > and < p < 1 such that 

\E y ip(Y n , F n:m (v)) - E n cp{Y m , F m (v))\/w{y) 

<\E y {E Yn _ m cp(Y m ,F m (v))}-E nl p(Y m ,F m (v))\/w(y) 



(A-3) 



< 



E y h{Y n _ m ) - j h(y)U(dy) 



w(y) 



< 



mUhip 
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For given m,k £ N, by using Lemma A.l again we have 

\En<p(Y m ,F m (v)) -E n (p(Y m+k ,F m+k (v))\/w(y) 

< En{\v(Y m+k ,F m+k .. k (v)) - <p{Y m+k ,F m+k:k (Mj!))\}/w(y) 

r v\\6U>{Ym+k) S 



< \\<p\\hE u { d{F m+k± {v),F m+k:k {M v k )) c 



w(y) 



< \\ip\\ h En\ sup Ey iT 

< y\\ wh Kr, m . 



f d{M^M- m )w{Y m+k ) 
V d(u,v) w(Y m ) 



w(Y m ) 
w(y) 



By making use of (A. 2), (A. 3) and the above inequality, we have that for 
any given n>m, k>0, and for all u, v £ M, 

\T n ip(y,u) --E ui p(Y m+k ,F m+k (v))\/w(y) < M\ wh (2Krf n + 1P n - m ). 

By setting m = n/2, we have that there exist A > and < r < 1 such that 

(A.4) ||TXy,u) - Q<p(y,u)\\ w < \\tp\\ wh Ar n . 

On the other hand, for u,v £ M, 

|(r-Q)y(y, U )-(T"-QMy,«)| 
(u>(y) d(«,w))' 5 

E J/V 9(y n ,M^)- / ^(y,n)n(dy xdn) 



(A.5) 



< 



Ey<p(y n ,MZ) + / ^(y,w)n(dy x dv) 



-<5i-l 



x [(w(y)d(u,u)) 

E y {|y(y n ,M»)-y(r n ,il^)|} 
(w(y)d(M,?;)) 5 



<|U|| „ mE f^ gCjCjg) ^ 
v IV My) 

< ||99|| u ,/ l i ; r?7 Tl by Lemma A.l. 
Denote = min{r/,r} and 7* = A + K. Combine (A.4) and (A.5) to get 



sup \\T n ip-Qtp\\ wh < sup \W\\whT1*pZ <7*P*- 



l|T n — QlUft 

Then we have (4.11) and this completes the proof. □ 
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Proof of Theorem 2. By using Lemma 2, standard arguments in- 
volving smoothing inequalities and Fourier inversion (cf. Chapter 4 of [5]) 
reduce the proof to that of showing for every 5 > 0, a > and b > 1, 



(A.6) 



sup \E n {e ia ' Sn )\ = o(n- b ). 

6<\a\<n a 



To prove (A.6), we follow the same idea as (3.43) of [31], letting Q = 
S t - S t -i (t = 1,2,.. .), Co = S and u), (y', u)) = E{e te & |(>o = V, M = 
u),(Y 1 =y',M 1 = v)}. 

Let J = {1, . .. , n}, and fix m > 1 to be determined later. Divide J into 
blocks A 1 ,B 1 ,...,A l ,B i as follows. Define ji, . . . ,j t by ji = 1, and j k+1 = 
inf { j > jk + 7m: j G J}, and let Z be the smallest integer for which the inf 
is undefined. Write 



A k = H{e n ' 1/2 ^:\j-j k \<m}, 

Bk = H{e n ~ 1/2ia '^ :j k + m+l<j< j k+1 - m - 1}, 



k — 1, . . . , I, 
fc = l Z — 1, 



Then e tQ 5 " = nUi ^fc-^fc- Given y G 3>, we have 
^ J] A k B k E vll B k E(A k \Cj :j / j k ) 



(A.7) 



i 



9-1 J 

E y l[A k B k (A g - E{A k \Cj:3^j q ))J{B k E{A k \Q:j^j k ) 

1 9+1 

By using Lemma 2(iv), there exists 5 > such that E\E(A k \(j :j ^ j q ) - 
E(A k \(j :0 < |j - jfc| < 3m) | < e"^. Therefore, (A.7) < 



l 

9=1 



Z 

£ 

9=1 



9-1 



E y HA k B k (A q -E(A k \Cj:j^j q )) 



i 



(A. 



-8m 



x n^(^ICj:0<|i-i fe |<3m) 

9+1 
I 

9=1 

The first summation term in (A. 8) vanishes since Yif 1 A k B k and ni+i -^fc x 
^(ylfclCj : < \j — j k \ < 3m) are both measurable with respect to the u-field 
generated by Q :j^j q - 
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Recall that the functions E(Ak\Cj : < \ j — jk\ < 3m), for fc = 1, . . . , I, are 
weakly dependent since jk+i — jk > 7m, fc = 1, . . . , I — 1. Using Assumption 
Kl, (4.14) and (4.15), we obtain 



E y ]J B k E{A k | Cj : < | j - < 3m) 

^(A fc |Ci:0<|i-j fc |<3m) 



< fj ^|^(A fc |Cj : < \j - j k \ < 3m) | + Ze" 



8m 



With the strong nonlattice condition (4.16), and conditional strong nonlat- 
tice condition (4.17), we find an upper bound for E y \E(A k \Qj :0 < \j — jk\ < 
3m) | . 

We have for \a\ > 5 the relation E y \E(Ak\Cj'-j ^ 3q)\ — e ~ S an d, hence, 
by (4.17) for all a £ W, \a\ < 5, E y \E{A k \Q :j ^ j q )\ < exp(-5|a| 2 /")- There- 
fore, for all a G R p , 



E y \E(A k \Cj:0<\j -j k \<3m)\ 



< e~ 5m + E y \E(A k \Cj :j + j q )\ < e~ dm + max(exp(-5|a| 2 /n), e~ d ). 

If we choose K appropriately and let m be the integral part of Klogn, 
then the assertion of the lemma follows from exp(— 5\a\ 2 /n) n / m < exp(— 5|a| 2 / 
(Klogn)) < exp(— 5 n e l 2 ) for |a| > cn £ and some 5 > 0. □ 
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1. Introduction. Upon reading the paper Efficient Likelihood Estimation 
in State Space Models by Cheng-Der Fuh I found a number of problems in the 
formulations and a number of mathematical errors. Together, these findings 
cast doubt on the validity of the main results in their present formulation. 
A reformulation and new proofs seem quite involved. 

The paper, Efficient Likelihood Estimation in State Space Models deals 
with asymptotic properties of the maximum likelihood estimate in hidden 
Markov models. The hidden Markov chain is X n , and the observed process 
is where £ n conditioned on the past and the hidden process depends on 
(X n ,£ n _i) only. The approach used is to add an iterated function system 
M n , and to consider the Markov process (X n ,£ n ,M n ). This is very much 
akin to the method in Douc and Matias [1], and I will use this article as a 
background for my comments. 

2. Problems. 

2.1. Definition of iterated function system. The first basic definition in 
the paper is a function Pg (£.,■) : M — > M that maps a function li£M into a 
new function in M (page 2031), 



P e (tj)h(x) = / p e (x,y)f(£ j ;6\y,£ j - 1 )h(y)m(dy). 

JydX 

[It is unclear why the author states that Pg(£j) is a function on (X x H d ) x M 
where X is the state space of the Markov chain.] The paper next defines the 
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composition Pg(^j + i) oP 9 (^)/i by first applying Pg(£j + i) to /i and then 
applying P#(£j) to the result. Using these two definitions we have 



The argument presented in the paper then appears to assume that this 
expression depends on some x and performs an integration before claiming 
that the result is the joint density Pn(£o ? ■ • ■ > £n! 0). This is clearly not correct 
since ~KQ{x n ) appears in the expression instead of ng(xo). 

Following the work of Douc and Matias [1] one would instead use the 
definition 



that is, the integration is with respect to the first variable in po(y, x) instead 
of the second. Changing the definition of Pfl(£o) correspondingly and using 
ordinary composition of functions, one finds that p n (£o> • ■ • > £,n] 8) equals the 
integral of P#(£; n ) o • • • o Pg(^i) o Pe{^o)^e with respect to x n+ \. However, 
making this change necessitates a new proof for the first part of Lemma 
3 on page 2056. Comparing with Douc and Matias ([1], Proposition 1) we 
see that this is one of the places where the latter authors use the stronger 
assumptions of that paper on the Markov chain. 

Turning to the iterated function system, Fuh's paper defines this as 



[formula (5.6), page 2045]. Taking this literally, and using the definitions 
in Fuh's paper, this is actually a mapping that takes a function as input 
and turns it into a constant. Instead M n should be a function obtained by 
applying a mapping to M n —\. This is achieved when using the definition 
suggested in (1) and adding ttq to the right-hand side of M n above. 

2.2. Harris recurrence of iterated function. Whether or not we make the 
changes suggested in the previous subsection, M n , defined on page 2045, is 
related to the density of (£o>--->£n)- Making the change suggested in (1) 
above we have precisely M n (x n+ i) = p(x n+ i,£o, . . . ,£„). Such an expression 
will typically tend to either zero or infinity. However, in Lemma 4 on page 
2046 Fuh claims that (X n ,£ n ,M n ) is a Harris recurrent Markov chain. It is 
difficult to pinpoint the exact origin of this problem. The Harris recurrence 
is established in Lemma 4 which in its formulation uses a measure Q from 
Theorem 1 (in the formulation there are two Q's, but these are different). 




j=n 



'xj-i,Zj)f(€j;0\xj,€j-i)m(dxj) >f(£o;9\x )m(dx ). 



(1) 




M n = P e (£n)°---oP e (£i)oP e (£ ) 
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So we need to establish Theorem 1 before proving Lemma 4. In Lemma 3 it 
is stated that the Markov iterated function system satisfies Assumption K. 
In Remark 1 (page 2035) Fuh says that Assumption K is different from the 
assumptions of Theorem 1. He then goes on to say that if Assumption K is 
supplemented with the extra assumption that (Y n ,M n ) is a Harris recurrent 
Markov chain, then Theorem 1 still holds. This, therefore, seemingly looks 
like a circular argument. 

Comparing again with Douc and Matias [1] they consider instead M n (x n +i) = 
p(x n+ i |£o, • ■ • > £n)- However, if we make this change we have introduced a new 
iterated function system, and a revised version of Lemma 3 is needed which 
presumably will lead to a different set of assumptions. 

2.3. Asymptotic properties of score function and observed information. 
The asymptotic normality of the score function is stated in Lemma 6 (page 
2048). In the proof of Lemma 6 (page 2060) the author appeals to Corollary 
1. The latter gives a central limit theorem for a sum of the form Y^j=i 
However, the paper wants to use this result on the sum 5Zj=i ^fl(M,_i,Mj). 
This looks innocent, but since 9 appears in the iteration of M n this is not on 
the form ^2™ =1 g(Mj-i,Mj). Instead one needs to consider a new iterated 
function system. This is what is done in Appendix D of Douc and Matias 

[!]• 

Similarly, it is stated that the proof of the main Theorem 5 follows a stan- 
dard argument. However, comparing with Douc and Matias [1] (Appendix 
D.3) it seems that yet another iterated function system is needed to deal 
with the convergence of the observed information. 

2.4. Generality of conditions. Assumption C5 on page 2043 restricts the 
dependency of the observed process on the hidden process. For the example 
considered in (b) on page 2044 one needs to consider 

HZo;0\y)m;Q\y,Zo) 
S HZo;e\z)m;e\z,Zo) 

exp{-l/2(g -s/) 2 -l/2(&-y) 2 } 
S exp{-l/2(£ - zf - 1/2(6 - z) 2 } 

= sup exp{z 2 - y 2 + (£ + £i)(y ~ z)} = °o- 

y,zeX 

Thus C5 is not satisfied (this seems to be contrary to the claim on page 2054 
line 8 from the bottom). 
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Problem 2.1. Definition of iterated function system. 



(2.6) P g {£ j )h(x)= Pe (y,x)f^ j ;9\x^ j . 1 )h(y)m(dy). 

JydX 

Define the composition of two random functions as 

(2.7) = / Pe (z,x)f(^ +l ;e\x,C 3 ) 



Pe(y, z)f{^fd\z,ij- 1 )h(y)m(dy) m(dz). 
yex J 

Page 2042. CI. . . .for all sn,si € R d , and sup^g^ f pg(y,x)m(dy) < oo. 
Since m is <r-finite, there exist pairwise disjoint X n such that X = U^Li ^-n, 
and < m(X n ) < oo. Assume -E^E^Li ^ sw PxeX n 9\x, s o)] < °o f° r an 
s eK d . Denote ge(£o,tii) = sup xeX Jpe(y,x)f^ 1 ;9\x,^ )m(dy). Further- 
more, we assume that there exists p > 1 as in K2 such that 

(5-2) sup 4 , S0) {logL( S0 ,£l) •• ■ 96(^-1^)^^-) } <0. 
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The example on Page 2044, L12, holds if a 7^ 0. The original (5.6) was wrong; 
it should be 



(5.6) 



M n := P e {tn) o • • • o P 9 (&) o P e (e )7r (page 2045). 



Page 2046. Lemma 3. ... Furthermore, under conditions CI, C6-C9, 
the function g defined in (5.7) belongs to C{Q x Q). 

Proof of Lemma 3. We consider only the case of P(£i), since the case 
of P(£o) an d P(£j), for j = 2, . . . ,n, is a straightforward consequence. For 
any two elements h±,h 2 £ M, and two fixed elements so,si € P d , by (5.8) 
we have 

d(P( Sl )h u P( Sl )h 2 ) 



Pe(y, x)f{s 1 ;0\x, s )hi(y)m(dy) 
Pe{y, x)f(sr,0\x, s )h 2 (y)m(dy) 



sup 



<d(hi,h 2 )sup / pg(y,x)f(si;0\x,s )m(dy) 
x&X J 

<C(sup / p g (y,x)m(dy))d(hi,h 2 ), 
\xexJ J 

where < C = sup^g^ f(si; 6\x, so) < oo, and by assumption CI, is a con- 
stant. Note that sup^g^ J po(y,x)m(dy) < oo by assumption CI. The equal- 
ity holds only if hi = h 2 m-almost surely. This proves the condition of Lip- 
schitz continuity in the second argument. 

Note that CI implies that Kl holds. Recall that M n = P(£ n ) o • • • oP(^) o 
P(^o) 71 " for 7r G M in (5.6). To prove the weighted mean contraction property 
K2, we observe that for p > 1, 

w(x ,s ) 



sup E (:ro so) <j log( L p 



XQ,SQ 



sup P>(x ,so)S !og( sup 



d(M p h 1 ,M p h 2 )w{X p ^ p ) 



d(hi,h 2 ) w(x ,s ) 
(7.1) < sup E (a . 0)S0 ) < log I ] sup peixj^Xj) 

xo,so ^ \j =1 Yxj£X J 

x m(dxj-i) 



w(x ,s Q ) 



<0. 



CORRIGENDUM 



3 



The last inequality follows from (5.2) in condition CI. 

To verify that assumption K3 holds, as m is cr-finite, we have X = (J^=i X n 
where the X n are pairwise disjoint and < m{X n ) < oo. Set 



(7.2) 



It is easy to see that j x£X h(x)m(dx) = 1 and hence belongs to M. Observe 
that 



Bd 2 (P(^)h,h) 



E 



(7.3) 



sup 

xi&X 



p 9 (x ,xi)f^ 1 ;9\x 1 ,so)h(x )m(dx ) - h(xi) 



<E 



iJ^ su p /(£i;%i> s o) SUp / 

n=l x x£Xn J Yxi&X J 



sup / pe(x ,xi)m(dx ) 

xiGX 



+ sup \h(xi)\. 

X]_£X 



Note that h(x) is piecewise constant by definition (7.2), i^E^Li 2^ sn Pxex n 
#|cc,so)] < 00 for all «o £ R>" by assumption CI and pq{xq,x\) is integrable 
of xo over the subset X n by assumption CI. These imply that (7.3) is finite. 
Finally, we observe 



sup E (2 . OiS0 J Li 



d(P(£i)/ii,P(£i)/i 2 )«>(*i,£i) 



< 



< OO. 



w(x ,s ) 
supE (x So) \ sup 

ro,^o Ui^ 2 d{hi,h 2 ) w{x ,s ) 

supE ( )(( sup [ po(xo,xi)f(€r,6\xi,s )m(dx ) ) 

ro,so I V^iGA-J / W(X ,S ) 



The last inequality follows from (5.3) in condition CI. 

Note that C8 and C9 imply that g £ C{Q x Q). Hence, the proof is com- 
plete. □ 



Problem 2.2. Harris recurrence of iterated function. This paper is an 
extension of Fuh (2003) for finite state space in which the likelihood func- 
tion can be expressed as the Li-norm of products of Markovian random 
matrices. Note that M n defined in (5.6) is an iterated random functions sys- 
tem governed by a Markov chain Y n . And Y n = (X n ,^ n ) in the state space 
models case. In Theorem 1 I only assume Y n = (X n ,£ n ) is Harris recurrent. 
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The purpose of the statement, "Note that under K1-K3, ... a Markovian 
iterated random functions system in Theorem 2," is to relate Theorems 1 
and 2, to which I can apply limiting theorems in Markov chains to the law 
of large numbers and central limit theorem (and Edgeworth expansion) for 
(Y n ,M n ). 

In Lemma 4 I want to prove Z n = ((X n ,£ n ),M n ) is Harris recurrent (Z n 
is defined in lines 1 and 2 on page 2056). In the proof, I can use the results 
in Theorem 1 since only Y n = (X n ,£ n ) is assumed to be Harris recurrent 
in Theorem 1. It is known that CI implies that Y n = (X n ,£ n ) is Harris 
recurrent. A new proof of Lemma 3 was given on pages 1 and 2. 

Problem 2.3. Asymptotic properties of score function and observed in- 
formation. Page 2060, L12. In the proof of Lemma 6, (7.9) defined a new 
iterated functions system; therefore Corollary 1 cannot be used directly. The 
same situation happens for Theorems 5 and 7. The rigorous proofs of these 
results will be given in a separate paper. 

Problem 2.4. Generality of conditions. C5. For 6 G N$(Qo), 
e fdtogfycx n(x)p(x, y)f(s ; 6\x)f(£r,0\y, s )m(dy) 

M w t 

for all i = 1, . . . , q. 

Change C5 accordingly. It is straightforward to check that C5 holds for 
the examples considered in Section 6. The proof of Lemma 5 can be done 
under C5. 

Other typos and mistakes. Page 2032, LI. ••• Pe(y,x)f(^j;9\x,^j-i)--- 
Xj-i = y and Xj G dx, . . . 

(3.7) vr(y)P(Y n G dz, M n e-\Y =y) = 7r(z)P(Y n G dy,M n G -\Yq = z). 

Page 2028, L5. (1 - a 2 ). Page 2043, C7, 6 -> tp x {0) was a typo; delete it. 
Page 2047, LI, then, "each component of" the Fisher information matrix. 
L5, replace "positive definite" by "finite." Page 2048, Theorem 5, assume 
I(#o) is invertible. Page 2057, L3, the notation m x Q x Q may be confusing; 
change it to m x Q x Q. 
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